Guide to line ending handling in Git

Clickbait title: Improve your Git experience with this one weird trick StackOverflow hates

If you're reading this guide, it's because you're either wondering about, or have actively run into a problem with, Git's handling of line endings. Maybe you're a Windows user, or someone working on the same repository as you is, or you do development across operating systems, or you're working with tools that break operating system convention, or...

And you've probably looked it up and know about core.autocrlf and git add --renormalize but you're still not really sure if it's doing the right thing or what it does to begin with. Or you did it once, and now need to look it up again because the documentation just doesn't really explain it well.

This guide serves to clear up how Git handles line endings, how to avoid problems with them, how to fix problems if they come up, and some great tools for dealing with line endings in general.

This guide pulls from a few different sources, mainly:

git-config(1)
gitattributes(5)
git-ls-files(1)
dos2unix(1)
GitHub's documentation on line endings

As well as some experimenting.

How did we get here?

This section mostly contains some fun nerd facts

Before computers, there were typewriters, which commonly used a lever mechanism to *return* the *carriage* to the left side of the page, and a wheel to *"feed"* the typewriter a new *line* of paper to write on.

Later on, typewriters became electric, and got hooked up to computer systems, becoming the "teletype" (the origin of the term "tty" in the Unix world), and received a bunch of different control characters to control how text was printed. Among them, the "Carriage Return" (CR) and "Line Feed" (LF).

These were later adopted into all sorts of digital text encodings, including ASCII, which later involved into Unicode, and encodings like UTF-8 which are (almost) universally used by everything nowadays.

And so it happened that the sequence of CR and LF became the line ending for many computer systems of the time. Windows uses it to this very day! However, in the Unix world, this was seen as unnecessarily complicated to deal with, so they adopted the following model:

Pressing the return key produced a CR character as input
The operating system converts it to an LF character and passes it to the program
When the program prints an LF character, the operating system converts it to a CR LF sequence before sending it out to the terminal (the screen)

This way, programs and files only ever had to deal with one line ending character, LF ('\n' in basically every programming languages), but input and output still worked with the old standards.

You can still see this mechanism on modern Unix systems like Linux: Running stty -onlcr turns off the LF -> CR LF conversion on output, causing LF characters to not move the cursor to the left of the terminal, only down a line!

$ echo hi; echo hi
hi
hi

$ stty -onlcr
$ echo hi; echo hi
hi
  hi

Old Mac systems (pre-OSX) did something similar, but used CR as the line ending character instead. OSX switched to the Unix convention of using LF.

Another place where CR LF prevails is in networking protocols like HTTP.

How does Git handle line endings?

By default, not at all. Everything you write gets committed as-is, and when you do a checkout, everything comes back as-is too.

Git will only do line ending conversion on files it considers "text". This can be done by explicitly assigning the file the text attribute (more on Git attributes in a bit), or if you enable the core.autocrlf option, Git will automatically check every single file and try to guess if it's a text file or not.

core.autocrlf is fine, it works, but it has a number of issues:

It is poorly explained and poorly understood
It overrides the defaults in unintuitive ways
It provides a "solution" to a much more subtle problem
It causes unintuitive warnings and has a risk of corrupting files

And maybe worst of all: it causes people to overlook a very powerful feature of git that addresses the problem in a much better way: Git attributes.

`.gitattributes`

Similar to the .gitignore file using patterns to exclude certain files from being checked in, the .gitattributes file uses patterns to assign certain attributes to files.

Attributes can tell git...

What encoding to convert the file to when checking it out (files will be checked in as UTF-8)
What "filters" to apply to the file on checkin/checkout
If and how to generate diff text for the file
If and how to perform three-way merges on the file
Which types of whitespace problems to check for
Whether to include the file in a git archive

But for now, only one attribute is important, text.

It can take 3 forms:

text, which enables line ending conversion
-text, which disables conversion
text=auto, which makes Git guess

Importantly, the core.autocrlf option acts as-if every file has text=auto, meaning Git makes this guess for all files by default. It's exactly like writing * text=auto in your .gitattributes file.

Besides -text, Git also has the binary attribute, which is a shorthand for -text -diff -merge. Setting -diff can be useful for large, generated files like package lock files, too.

`core.eol` and the `eol` attribute

The core.eol option tells Git what line endings to use on checkout. The possible values are lf, crlf and native (the default). It is overriden by core.autocrlf, with true forcing CR LF, and input forcing LF.

In addition to the text attribute, Git also has the eol attribute, which lets you specify the desired line ending to use.

For example, you may want POSIX shell scripts to always use LF, even on Windows:

*.sh        eol=lf

or always use CR LF for Visual Studio files:

*.sln       eol=crlf

eol implies text, but you can also specify it with text=auto.

Detecting incorrect line endings

The command git ls-files --eol shows you exactly what line endings your files have in the index (files that are checked in) and your worktree (files that are checked out) as well as which text and eol attributes are set for them.

The format is i/<status> w/<status> attr/<attribute> <filename>, where <status> is one of -text, none, lf, crlf, or mixed. If Git doesn't show anything, that means the file is not a regular file, or is not present in the index or worktree.

A text file is "normalized" if it shows as i/lf, meaning it is stored using only LF in Git's database. Files that are i/crlf or i/mixed will not be converted by Git when you check them out.

The git add --renormalize . command is used for re-adding files to the index after changing the text attribute or fixing the line endings.

To prevent you from accidentally committing files with mixed line endings, or falsely committing binary files that Git thinks are text, Git has the core.safecrlf option. By default it only prints a warning, but if you set it to true it will completely prevent you from doing something that could cause an incorrect conversion to take place.

Fixing incorrect line endings

One of the easiest ways to fix incorrect line endings is through use of the dos2unix or unix2dos tools. These tools have several handy features for dealing with legacy file formats and encodings, but most commonly they're used for converting from one type of line ending to the other. They are available in Git for Windows by default, and can be installed easily on most operating systems.

Once you've fixed everything, validate with git ls-files --eol and git add --renormalize ., verify with git diff -c --check, and commit.

If you have any files left over in your worktree with the wrong line endings, you can either convert them as before, or you can delete them and check them out again, which will cause Git to apply the correct line endings.

In this example, cargo new always produces files with LF, even on Windows. Note that in this case, core.autocrlf=true was used, so all files have the attributes text=auto eol=crlf set, although ls-files will not show it.

$ git ls-files --eol
i/lf    w/lf    attr/                   .gitignore
i/lf    w/lf    attr/                   Cargo.toml
i/lf    w/lf    attr/                   src/main.rs
$ rm .gitignore Cargo.toml src/main.rs
$ git checkout -- .
Updated 3 paths from the index
$ git ls-files --eol
i/lf    w/crlf  attr/                   .gitignore
i/lf    w/crlf  attr/                   Cargo.toml
i/lf    w/crlf  attr/                   src/main.rs

Conclusion

Using core.autocrlf is fine, but one needs to be aware of how it works.

Setting core.safecrlf to true is heavily recommended to catch errors as early as possible.

.gitattributes files are great for managing how Git treats your files, and there is no reason not to use them, even if only to set * text=auto for people that have core.autocrlf turned off. If you are contributing to a project that doesn't have one already, maybe suggest adding one.

If you are using Git attributes diligently, then you can turn core.autocrlf off, and Git will only do what you tell it to.

There is a global attributes file, pointed to by the core.attributesFile option, so you can set * text=auto in there instead of using core.autocrlf. You can also find its location via git var GIT_ATTR_GLOBAL. One benefit of this is that it will be explicitly shown by git ls-files --eol.

There is also a repository-local one, in .git/info/attributes, which you can use to apply attributes only to your personal clone of a project.

LunarLambda/git-eol.md