Clickbait title: Improve your Git experience with this one weird trick StackOverflow hates
If you're reading this guide, it's because you're either wondering about, or have actively run into a problem with, Git's handling of line endings. Maybe you're a Windows user, or someone working on the same repository as you is, or you do development across operating systems, or you're working with tools that break operating system convention, or...
And you've probably looked it up and know about core.autocrlf
and git add --renormalize
but you're still not really sure if it's doing the right thing
or what it does to begin with. Or you did it once, and now need to look it up
again because the documentation just doesn't really explain it well.
This guide serves to clear up how Git handles line endings, how to avoid problems with them, how to fix problems if they come up, and some great tools for dealing with line endings in general.
This guide pulls from a few different sources, mainly:
git-config(1)
gitattributes(5)
git-ls-files(1)
dos2unix(1)
GitHub's documentation on line endings
As well as some experimenting.
This section mostly contains some fun nerd facts
Before computers, there were typewriters, which commonly used a lever mechanism to *return* the *carriage* to the left side of the page, and a wheel to *"feed"* the typewriter a new *line* of paper to write on.Later on, typewriters became electric, and got hooked up to computer systems, becoming the "teletype" (the origin of the term "tty" in the Unix world), and received a bunch of different control characters to control how text was printed. Among them, the "Carriage Return" (CR) and "Line Feed" (LF).
These were later adopted into all sorts of digital text encodings, including ASCII, which later involved into Unicode, and encodings like UTF-8 which are (almost) universally used by everything nowadays.
And so it happened that the sequence of CR and LF became the line ending for many computer systems of the time. Windows uses it to this very day! However, in the Unix world, this was seen as unnecessarily complicated to deal with, so they adopted the following model:
- Pressing the return key produced a CR character as input
- The operating system converts it to an LF character and passes it to the program
- When the program prints an LF character, the operating system converts it to a CR LF sequence before sending it out to the terminal (the screen)
This way, programs and files only ever had to deal with one line ending
character, LF ('\n'
in basically every programming languages), but input and
output still worked with the old standards.
You can still see this mechanism on modern Unix systems like Linux:
Running stty -onlcr
turns off the LF -> CR LF conversion on output,
causing LF characters to not move the cursor to the left of the terminal,
only down a line!
$ echo hi; echo hi
hi
hi
$ stty -onlcr
$ echo hi; echo hi
hi
hi
Old Mac systems (pre-OSX) did something similar, but used CR as the line ending character instead. OSX switched to the Unix convention of using LF.
Another place where CR LF prevails is in networking protocols like HTTP.
By default, not at all. Everything you write gets committed as-is, and when you do a checkout, everything comes back as-is too.
Git will only do line ending conversion on files it considers "text". This can
be done by explicitly assigning the file the text
attribute (more on Git
attributes in a bit), or if you enable the core.autocrlf
option, Git will
automatically check every single file and try to guess if it's a text file or
not.
core.autocrlf
is fine, it works, but it has a number of issues:
- It is poorly explained and poorly understood
- It overrides the defaults in unintuitive ways
- It provides a "solution" to a much more subtle problem
- It causes unintuitive warnings and has a risk of corrupting files
And maybe worst of all: it causes people to overlook a very powerful feature of git that addresses the problem in a much better way: Git attributes.
Similar to the .gitignore
file using patterns to exclude certain files from
being checked in, the .gitattributes
file uses patterns to assign certain
attributes to files.
Attributes can tell git...
- What encoding to convert the file to when checking it out (files will be checked in as UTF-8)
- What "filters" to apply to the file on checkin/checkout
- If and how to generate diff text for the file
- If and how to perform three-way merges on the file
- Which types of whitespace problems to check for
- Whether to include the file in a
git archive
But for now, only one attribute is important, text
.
It can take 3 forms:
text
, which enables line ending conversion-text
, which disables conversiontext=auto
, which makes Git guess
Importantly, the core.autocrlf
option acts as-if every file has
text=auto
, meaning Git makes this guess for all files by default.
It's exactly like writing * text=auto
in your .gitattributes
file.
Besides -text
, Git also has the binary
attribute, which is a shorthand for
-text -diff -merge
. Setting -diff
can be useful for large, generated files
like package lock files, too.
The core.eol
option tells Git what line endings to use on checkout. The
possible values are lf
, crlf
and native
(the default). It is overriden by
core.autocrlf
, with true
forcing CR LF, and input
forcing LF.
In addition to the text
attribute, Git also has the eol
attribute, which
lets you specify the desired line ending to use.
For example, you may want POSIX shell scripts to always use LF, even on Windows:
*.sh eol=lf
or always use CR LF for Visual Studio files:
*.sln eol=crlf
eol
implies text
, but you can also specify it with text=auto
.
The command git ls-files --eol
shows you exactly what line endings your files
have in the index (files that are checked in) and your worktree (files that are
checked out) as well as which text
and eol
attributes are set for them.
The format is i/<status> w/<status> attr/<attribute> <filename>
, where
<status>
is one of -text
, none
, lf
, crlf
, or mixed
. If Git doesn't
show anything, that means the file is not a regular file, or is not present in
the index or worktree.
A text file is "normalized" if it shows as i/lf
, meaning it is stored using
only LF in Git's database. Files that are i/crlf
or i/mixed
will not be
converted by Git when you check them out.
The git add --renormalize .
command is used for re-adding files to the index
after changing the text
attribute or fixing the line endings.
To prevent you from accidentally committing files with mixed line endings, or
falsely committing binary files that Git thinks are text, Git has the
core.safecrlf
option. By default it only prints a warning, but if you set it
to true
it will completely prevent you from doing something that could cause
an incorrect conversion to take place.
One of the easiest ways to fix incorrect line endings is through use of the
dos2unix
or unix2dos
tools. These tools have several handy features for
dealing with legacy file formats and encodings, but most commonly they're used
for converting from one type of line ending to the other. They are available in
Git for Windows by default, and can be installed easily on most operating
systems.
Once you've fixed everything, validate with git ls-files --eol
and git add --renormalize .
, verify with git diff -c --check
, and commit.
If you have any files left over in your worktree with the wrong line endings, you can either convert them as before, or you can delete them and check them out again, which will cause Git to apply the correct line endings.
In this example, cargo new
always produces files with LF, even on Windows.
Note that in this case, core.autocrlf=true
was used, so all files have
the attributes text=auto eol=crlf
set, although ls-files
will not show it.
$ git ls-files --eol
i/lf w/lf attr/ .gitignore
i/lf w/lf attr/ Cargo.toml
i/lf w/lf attr/ src/main.rs
$ rm .gitignore Cargo.toml src/main.rs
$ git checkout -- .
Updated 3 paths from the index
$ git ls-files --eol
i/lf w/crlf attr/ .gitignore
i/lf w/crlf attr/ Cargo.toml
i/lf w/crlf attr/ src/main.rs
Using core.autocrlf
is fine, but one needs to be aware of how it works.
Setting core.safecrlf
to true
is heavily recommended to catch errors as
early as possible.
.gitattributes
files are great for managing how Git treats your files, and
there is no reason not to use them, even if only to set * text=auto
for
people that have core.autocrlf
turned off. If you are contributing to a
project that doesn't have one already, maybe suggest adding one.
If you are using Git attributes diligently, then you can turn core.autocrlf
off, and Git will only do what you tell it to.
There is a global attributes
file, pointed to by the core.attributesFile
option, so you can set * text=auto
in there instead of using core.autocrlf
.
You can also find its location via git var GIT_ATTR_GLOBAL
.
One benefit of this is that it will be explicitly shown by git ls-files --eol
.
There is also a repository-local one, in .git/info/attributes
, which you can
use to apply attributes only to your personal clone of a project.