Line ending format used in OS:
- Windows:
CR
(Carriage Return\r
) andLF
(LineFeed\n
) pair - OSX, Linux:
LF
(LineFeed\n
)
We can configure git to auto-correct line ending formats for each OS in two ways.
- Git Global configuration
- Using
.gitattributes
file
Global Configuration
In Linux/OSX
git config --global core.autocrlf input
This will fix any CRLF
to LF
when you commit.
In Windows
git config --global core.autocrlf true
This will make sure that, when you checkout in windows, all LF
will be converted to CRLF
.
.gitattributes File
It is a good idea to keep a .gitattributes
file as we don’t want to expect everyone in our team to set their own config. This file should be placed in the repository root and, if it exists, git will respect it.
* text=auto
This will treat all files as text files and convert to OS’s line ending on checkout and back to LF
on commit automatically. If you want to specify the line ending explicitly, you can use:
* text eol=crlf
* text eol=lf
The first one is for checkout and the second one is for commit.
*.jpg binary
This will treat all .jpg
images as binary files, regardless of path. So no conversion needed.
Or you can add path qualifiers:
my_path/**/*.jpg binary
Чтобы избежать проблем в объектах diff, можно настроить Git для правильной обработки окончаний строк.
Сведения об окончаниях строк
Каждый раз, когда вы нажимаете клавишу ВВОД на клавиатуре, вы вставляете в строку невидимый символ, называемый окончанием строки. Разные операционные системы обрабатывают окончания строк по-разному.
При совместной работе над проектами в Git и GitHub Git может выдавать непредвиденные результаты, например, если вы работаете на компьютере Windows, а ваш коллега внес изменение в macOS.
Чтобы эффективно взаимодействовать с пользователями, использующими разные операционные системы, вы можете настроить обработку окончаний строк в Git.
Глобальные параметры для окончаний строк
Команда git config core.autocrlf
используется для изменения способа обработки окончаний строк в Git. Она принимает один аргумент.
Параметры для отдельных репозиториев
При необходимости можно настроить .gitattributes
файл для управления тем, как Git считывает конец строки в определенном репозитории. При фиксации этого файла в репозитории Git переопределяет параметр core.autocrlf
для всех участников репозитория. Это гарантирует согласованное поведение для всех пользователей независимо от параметров и среды Git.
Файл .gitattributes
должен быть создан в корне репозитория и зафиксирован, как и любой другой файл.
.gitattributes
Файл выглядит как таблица с двумя столбцами:
- В левом столбце содержатся имена файлов Git для сопоставления.
- В правом столбце содержатся конфигурации окончаний строк, которые Git должен использовать для соответствующих файлов.
Пример
Ниже приведен пример .gitattributes
файла. Его можно использовать в качестве шаблона для ваших репозиториев:
# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto
# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout.
*.c text
*.h text
# Declare files that will always have CRLF line endings on checkout.
*.sln text eol=crlf
# Denote all files that are truly binary and should not be modified.
*.png binary
*.jpg binary
Вы видите типы сопоставляемых файлов, разделенные пробелами (*.c
, *.sln
, *.png
), после которых указывается параметр — text
, text eol=crlf
, binary
. Мы рассмотрим некоторые возможные параметры ниже.
-
text=auto
Git будет обрабатывать файлы наилучшим образом. Это хороший вариант по умолчанию. -
text eol=crlf
Git будет всегда преобразовывать окончания строк вCRLF
при извлечении. Этот вариант следует использовать для файлов, которые должны поддерживать окончанияCRLF
, даже в OSX или Linux. -
text eol=lf
Git будет всегда преобразовывать окончания строк вLF
при извлечении. Этот вариант следует использовать для файлов, которые должны поддерживать окончания LF, даже в Windows. -
binary
Git поймет, что указанные файлы не являются текстом и изменять их не следует. Параметрbinary
также является псевдонимом для-text -diff
.
Обновление репозитория после изменения окончаний строк
После установки core.autocrlf
параметра или фиксации .gitattributes
файла Git автоматически изменяет конец строки, чтобы соответствовать новой конфигурации. Вы можете найти, что Git сообщает об изменениях в файлах, которые вы не изменили.
Чтобы убедиться, что все конец строки в репозитории соответствуют новой конфигурации, создайте резервную копию файлов с помощью Git, а затем удалите и восстановите все файлы, чтобы нормализовать конец строки.
-
Перед добавлением или фиксацией изменений убедитесь, что Git правильно применил конфигурацию. Например, Git автоматически определяет, являются ли файлы в репозитории текстовыми или двоичными файлами. Чтобы избежать повреждения двоичных файлов в репозитории, рекомендуется явно пометить файлы как двоичные в
.gitattributes
. Дополнительные сведения см. в разделе gitattributes — определение атрибутов на путь в документации по Git. -
Чтобы избежать потери локальных изменений в файлах в репозитории, добавьте и зафиксируйте все выдающиеся изменения, выполнив следующие команды.
Shell git add . -u git commit -m "Saving files before refreshing line endings"
-
Чтобы обновить все файлы в текущей ветви, чтобы отразить новую конфигурацию, выполните следующие команды.
Shell git rm -rf --cached . git reset --hard HEAD
-
Чтобы отобразить перезаписанные, нормализованные файлы, выполните следующую команду.
-
При необходимости, чтобы зафиксировать все невыполненные изменения в репозитории, выполните следующую команду.
Shell git commit -m "Normalize all the line endings"
Дополнительные материалы
- Настройка Git — атрибуты Git в книге Pro Git
- git-config на страницах руководств для Git
- Начало работы — первая настройка Git в книге Pro Git
- Учитывайте окончания строк, автор: Тим Клем
Line ending format used in OS:
- Windows:
CR
(Carriage Returnr
) andLF
(LineFeedn
) pair - OSX, Linux:
LF
(LineFeedn
)
We can configure git to auto-correct line ending formats for each OS in two ways.
- Git Global configuration
- Using
.gitattributes
file
Global Configuration
In Linux/OSX
git config --global core.autocrlf input
This will fix any CRLF
to LF
when you commit.
In Windows
git config --global core.autocrlf true
This will make sure that, when you checkout in windows, all LF
will be converted to CRLF
.
.gitattributes File
It is a good idea to keep a .gitattributes
file as we don’t want to expect everyone in our team to set their own config. This file should be placed in the repository root and. If it exists, git will respect it.
* text=auto
This will treat all files as text files and convert to OS’s line ending on checkout and back to LF
on commit automatically. If you want to specify the line ending explicitly, you can use:
* text eol=crlf
* text eol=lf
The first one is for checkout and the second one is for commit.
*.jpg binary
This will treat all .jpg
images as binary files, regardless of path. So no conversion needed.
Or you can add path qualifiers:
my_path/**/*.jpg binary
We always try to make a general Checks, so check should enforce user defined LineEnd symbol.
We will consider that idea as soon as we finish moratorium period for new Checks, if you ready to implement it , please be welcome at our experimental Check project — https://github.com/sevntu-checkstyle/sevntu.checkstyle.
This check is completely useless, as good VCS configuration will force local working copy to have Windows-like (CRLF) EOL-style on Windows clients and Linux EOL-style (LF) on Linux clients. In this way, the correctly configured clients should have multiple errors for checkstyle.
This contstraint is to be forced inside your VCS. For Stash you may use https://github.com/pbaranchikov/stash-eol-check, for gitolite-admin you should perform some BASH-scripting and so on.
@andrewgaul , what kind of VCS you have ? GIT manage it as pbaranchikov described.
Copy link
Contributor
Author
@pbaranchikov I agree that correctly configured git clients can warn about line endings via various hooks, although as a project admin using GitHub there is no real way to enforce this on either the server or our contributors’ clients. We enabled the previously mentioned RegexpMultiline
in jclouds/jclouds#717.
I don’t think this rule is a good idea. Using git, you have no control which EOL style your clients use. We may end up with rule that passes on CI, but fails for people using Windows.
At my work we are using git without enforcing line endings and we do enforce them with checkstyle in java sources.
Test inputs have different line endings to test that file based logic works as expected with any combination.
I completely agree with @andrewgaul — such a check could be very valuable imho. Inconsistent line endings can certainly cause problems, in particular for resources files (config files, csv files, etc).
Let’s assume for a second that git is used. Clients can decide how files are converted using the core.autocrlf feature. Some of them might have turned it off, leading to Windows clients pushing CRLF
line endings, which is not supposed to happen according to git standards. If your code assumes LF
line endings, a CSV file might become unreadable 😢
@mkordas Actually, you can configure line endings very nicely using the .gitattributes
file. See #1045 for example. You can also specify line endings per file type and git can also try to automatically detect and performs EOL conversion. Unfortunately, the jgit implementation that eclipse uses simply ignores the .gitattributes
file 😢
To sum it up: A check that verifies that only text/code files with LF
line endings are committed makes perfectly sense to me as ensures consistency of the code files and thus helps avoiding bugs that are really hard to track down.
@msteiger, generally I agree with you. I had issues with newlines in my projects too. I have just couple concerns:
- Let’s assume we have developers using Windows and CI on Linux and check is configured to enforce
LF
— Checkstyle would fail on development machines - Let’s assume we’ve added property e.g. «match system line separator» to enforce
LF
on Unix andCRLF
on Windows — Checkstyle would fail for all developers that haveautocrlf
disabled - Let’s assume we use
.gitattributes
file — then each entry from this file needs suppression in Checkstyle config… - Let’s assume developer has all files as
CRLF
on Windows but temporarily wants to execute bash script using Cygwin, sodos2unix
is required. Checkstyle would fail, even if then Git afterwards would manage that case.
I’m just not convinced to have the check that is so fragile, but maybe just good description (and warning) in documentation would be enough.
+1 @andrewgaul et al IMHO this is most certainly a good idea and would be great to have built-in! Note «(?s:rn.*)» is better because it will only match the first wrong newline, so less spammy logs (and perhaps even faster?). See also my blurb (blog post) about this on http://blog2.vorburger.ch/2015/06/eol.html.
There are bunch of VCS, so might be some VCS does not handle that correctly.
Somebody could make this Check if it is required, all are welcome to contribute.
This was referenced
Nov 17, 2016
Interesting related problem came up in #3557:
If you are in a multi-OS environment, and you use core.autocrlf, then e.g. when running on Windows git will transform LF to CR+LF on disk, right? So this kind of regexp check would fail there, even though it’s correct in the repo and would pass on *NIX.. (Actually HAS failed, in Checkstyle’s own build, some of which running on Windows; see #3557.)
So if such a new Check was developed it would have to have some sort of mode like «this project is part of a repo using core.autocrlf, and so if running on Windows OS then skip this check» — thoughts, anyone?
Or perhaps using core.autocrlf and having Windows build machines is just a bad idea? Reality seems to be that some people do build on Windows (including CS itself). Or just remove core.autocrlf from CS’s own repo?
perhaps using core.autocrlf and having Windows build machines is just a bad idea?
Businesses and private users are the ones who really stick to using Windows. Most businesses may not use Windows on production build machines, but they still need to be able to test them locally.
Abandoning windows from any project will diminish your user base, which is usually a bad idea.
repo using core.autocrlf
You would also need to take into account .gitattributes
which can force different line endings for specific files.
repo using core.autocrlf, and so if running on Windows OS then skip this check
Nothing prevents Linux users from turning option on. Maybe repo has legacy code and was developed and solely used by windows. You would have to skip all who use autocrlf
.
if such a new Check was developed it would have to have some sort of mode like «this project is part of a repo using core.autocrlf, and so if running on Windows OS then skip this check»
Why is it an issue to examine local copy (Checkstyle’s domain) and not examine the server’s copy (git’s domain)? Checkstyle may not be set to validate every file in repository too.
This sounds to me more like an issue not validating the commits going into the repository than the files on the local hard drive.
I definitely agree @vorburger and @msteiger. Such a check would be valuable.
Although there may exists some projects which need CRLF line endings — I believe more than 9/10 should stick with the LF and core.autocrlf = input
(on every platform). I’ve seen far too many accidental CRLF endings in my life.
@mkordas Please note that all 4 points raised by you are solved with a single Git option: autocrlf = input
.
And all who still disagree, please remember that this module, as every other, should be enabled when project would need this — nobody is forced to do so.
@gdemecki Now it’s more than year after my comment from above and I had so many fights with incorrect line endings in projects that I definitely see a need for such check. We just need to document it properly, ideally with some hints how to configure Git to be compatible with every option.
I am OK to make this Check , as I told mentioned before I will put approved label as we get out of moratorium period
gaul
mentioned this issue
Jul 7, 2017
This was referenced
Mar 6, 2019
Enforcement of same line ends in whole repo might be problematic if users keep scripts of different OS is same repo , example of problems when Linux line ends appear in windows scripts https://serverfault.com/a/429598
But Check that make line ends consistent in some set of files is useful, and user need to configure it properly.
This check is completely useless, as good VCS configuration will force local working copy to have Windows-like (CRLF) EOL-style on Windows clients and Linux EOL-style (LF) on Linux clients. In this way, the correctly configured clients should have multiple errors for checkstyle.
Concept of having your version control system be opinionated about line endings is rather new and controversial.
I certainly wouldn’t say that every setup that does not do line ending normalization on checkout/checkin is incorrect.
If you’ve ever worked on a project where developers use different operating systems, you know that line endings can be a peculiar source of frustration. This issue of CRLF vs. LF line endings is actually fairly popular—you’ll find tons of questions on StackOverflow about how to configure software like Git to play nicely with different operating systems.
The typical advice is to configure your local Git to handle line ending conversions for you. For the sake of comprehensiveness, we’ll look at how that can be done in this article, but it isn’t ideal if you’re on a large team of developers. If just one person forgets to configure their line endings correctly, you’ll need to re-normalize your line endings and recommit your files every time a change is made.
A better solution is to add a .gitattributes
file to your repo so you can enforce line endings consistently in your codebase regardless of what operating systems your developers are using. Before we look at how that’s done, we’ll briefly review the history behind line endings on Windows and Unix so we can understand why this issue exists in the first place.
History can be boring, though, so if you stumbled upon this post after hours of frustrated research, you can skip straight to A Simple .gitattributes
Config and grab the code. However, I do encourage reading the full post to understand how these things work under the hood—you’ll (hopefully) never have to Google line endings again!
Table of Contents
CRLF vs. LF: What Are Line Endings, Anyway?
To really understand the problem of CRLF vs. LF line endings, we need to brush up on a bit of typesetting history.
People use letters, numbers, and symbols to communicate with one another. It’s how you’re reading this post right now! But computers can only understand and work with numbers. Since the files on your computer consist of strings of human-readable characters, we need a system that allows us to convert back and forth between these two formats. The Unicode standard is that system—it maps characters like A
and z
to numbers, bridging the gap between human languages and the language of computers.
Notably, the Unicode standard isn’t just for visible characters like letters and numbers. A certain subset are control characters, also known as non-printing characters. They aren’t used to render visible characters; rather, they’re used to perform unique actions, like deleting the previous character or inserting a newline.
LF
and CR
are two such control characters, and they’re both related to line endings in files. Their history dates back to the era of the typewriter, so we’ll briefly look at how that works so you understand why we have two different control characters rather than just one. Then, we’ll look at how this affects the typical developer experience on a multi-OS codebase.
LF
: Line Feed
LF stands for “line feed,” but you’re probably more familiar with the term newline (the escape sequence n
). Simply put, this character represents the end of a line of text. On Linux and Mac, this is equivalent to the start of a new line of text. That distinction is important because Windows does not follow this convention. We’ll discuss why once we learn about carriage returns.
CR
: Carriage Return
CR (the escape sequence r
) stands for carriage return, which moves the cursor to the start of the current line. For example, if you’ve ever seen a download progress bar on your terminal, this is how it works its magic. By using the carriage return, your terminal can animate text in place by returning the cursor to the start of the current line and overwriting any existing text.
You may be wondering where the need for such a character originated (beyond just animating text, which happens to be a niche application). It’s a good question—and the answer will help us better understand why Windows uses CRLF
.
Typewriters and the Carriage Return
Back when dinosaurs roamed the earth, people used to lug around these chunky devices called typewriters.
You feed the device a sheet of paper fastened to a mechanical roll known as the carriage. With each keystroke, the typewriter prints letters using ink on your sheet of paper, shifting the carriage to the left to ensure that the next letter you type will appear to the right of the previous one. You can watch a typewriter being used in action to get a better sense for how this works.
Of course, once you run out of space on the current line, you’ll need to go down to the next line on your sheet of paper. This is done by rotating the carriage to move the paper up a certain distance relative to the typewriter’s “pen.” But you also need to reset your carriage so that the next character you type will be aligned to the left-hand margin of your paper. In other words, you need some way to return the carriage to its starting position. And that’s precisely the job of the carriage return: a metal lever attached to the left side of the carriage that, when pushed, returns the carriage to its starting position.
That’s all good and well, but you’re probably wondering how this is relevant in the world of computers, where carriages, levers, and all these contraptions seem obsolete. We’re getting there!
Teletypewriters and the Birth of CRLF
Moving on to the early 20th century, we arrive at the teletypewriter, yet another device predating the modern computer. Basically, it works exactly the same way that a typewriter does, except instead of printing to a physical sheet of paper, it sends your message to a receiving party via a transmitter, either over a physical wire or radio waves.
Now we’re digital! These devices needed to use both a line feed character (LF
) and a carriage return character (CR
) to allow you to type from the start of the next line of text. That’s exactly how the original typewriter worked, except it didn’t have any notion of “characters” because it was a mechanically operated device. With the teletype, this process is more or less automatic and triggered by a keystroke—you don’t have to manually push some sort of “carriage” or move a sheet of paper up or down to achieve the same effect.
It’s easier to visualize this if you think of LF
and CR
as representing independent movements in either the horizontal or vertical direction, but not both. By itself, a line feed moves you down vertically; a carriage return resets your “cursor” to the very start of the current line. We saw the physical analogue of CR
and LF
with typewriters—moving to the next line of text required rotating the carriage to move the sheet of paper up (line feed), and returning your “cursor” to the start of that new line required using a mechanical piece aptly named the carriage return.
Teletypes set the standard for CRLF
line endings in some of the earliest operating systems, like the popular MS-DOS. Microsoft has an excellent article explaining the history of CRLF
in teletypes and early operating systems. Here’s a relevant snippet:
This protocol dates back to the days of teletypewriters. CR stands for “carriage return” – the CR control character returned the print head (“carriage”) to column 0 without advancing the paper. LF stands for “linefeed” – the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.
If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you’ll see that they all specify CR+LF as the line termination sequence. So the the real question is not “Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?” but rather “Why did other people choose to differ from these standards documents and use some other line terminator?”
Why is the line terminator CR+LF?
MS-DOS used the two-character combination of CRLF
to denote line endings in files, and modern Windows computers continue to use CRLF
as their line ending to this day. Meanwhile, from its very inception, Unix used LF
to denote line endings, ditching CRLF
for consistency and simplicity. Apple originally used only CR
for Mac Classic but eventually switched to LF
for OS X, consistent with Unix.
This makes it seem like Windows is the odd one out when it’s technically not. Developers usually get frustrated with line endings on Windows because CRLF
is seen as an artifact of older times, when you actually needed both a carriage return and a line feed to represent newlines on devices like teletypes.
It’s easy to see why CRLF
is redundant by today’s standards—using both a carriage return and a line feed assumes that you’re bound to the physical limitations of a typewriter, where you had to explicitly move your sheet of paper up and then reset the carriage to the left-hand margin. With a file, it suffices to define the newline character as implicitly doing the job of both a line feed and a carriage return under the hood. In other words, so long as your operating system defines the newline character to mean that the next line starts at the beginning and not at some arbitrary column offset, then we have no need for an explicit carriage return in addition to a line feed—one symbol can do the job of both.
While it may seem like a harmless difference between operating systems, this issue of CRLF vs. LF has been causing people headaches for a long time now. For example, basic Windows text editors like Notepad used to not be able to properly interpret LF
alone as a true line ending. Thus, if you opened a file created on Linux or Mac with Notepad, the line endings would not get rendered correctly. Notepad was later updated in 2018 to support LF
.
As you can probably imagine, the lack of a universal line ending presents a dilemma for software like Git, which relies on very precise character comparisons to determine if a file has changed since the last time it was checked in. If one developer uses Windows and another uses Mac or Linux, and they each save and commit the same files, they may see line ending changes in their Git diffs—a conversion from CRLF
to LF
or vice versa. This leads to unnecessary noise due to single-character changes and can be quite annoying.
For this reason, Git allows you to configure line endings in one of two ways: by changing your local Git settings or by adding a .gitattributes
file to your project. We’ll look at both approaches over the course of the next several sections.
All Line Ending Transformations Concern the Index
Before we look at any specifics, I want to clarify one detail: All end-of-line transformations in Git occur when moving files in and out of the index—the temporary staging area that sits between your local files (working tree) and the repository that later gets pushed to your remote. When you stage files for a commit, they enter the index and may be subject to line ending normalization (depending on your settings). Conversely, when you check out a branch or a set of files, you’re moving files out of the index and into your working tree.
When normalization is enabled, line endings in your local and remote repository will always be set to LF
and never CRLF
. However, depending on some other settings, Git may silently check out files into the working tree as CRLF
. Unlike the original problem described in this article, this will not pollute git status
with actual line ending changes—it’s mainly used to ensure that Windows developers can take advantage of CRLF
locally while always committing LF
to the repo.
We’ll learn more about how all of this works in the next few sections.
Configuring Line Endings in Git with core.autocrlf
As I mentioned in the intro, you can tell Git how you’d like it to handle line endings on your system with the core.autocrlf
setting. While this isn’t the ideal approach for configuring line endings in a project, it’s still worth taking a brief look at how it works.
You can enable end-of-line normalization in your Git settings with the following command:
git config --global core.autocrlf [true|false|input]
You can also view the current Git setting using this command:
git config --list
By default, core.autocrlf
is set to false
on a fresh install of Git, meaning Git won’t perform any line ending normalization. Instead, Git will defer to the core.eol
setting to decide what line endings should be used; core.eol
defaults to native
, which means it depends on the OS you’re using. That’s not ideal because it means that CRLF
may make its way into your code base from Windows devs.
That leaves us with two options if we decide to configure Git locally: core.autocrlf=true
and core.autocrlf=input
. The line endings for these options are summarized below.
Both of these options enable automatic line ending normalization for text files, with one minor difference: core.autocrlf=true
converts files to CRLF
on checkout from the repo to the working tree, while core.autocrlf=input
leaves the working tree untouched.
For this reason, core.autocrlf=true
tends to be recommended setting for Windows developers since it guarantees LF
in the remote copy of your code while allowing you to use CRLF
in your working tree for full compatibility with Windows editors and file formats.
Normalizing Line Endings in Git with .gitattributes
You certainly could ask all your developers to configure their local Git. But this is tedious, and it can be confusing trying to recall what these options mean since their recommended usage depends on your operating system. If a developer installs a new environment or gets a new laptop, they’ll need to remember to reconfigure Git. And if a Windows developer forgets to read your docs, or someone from another team commits to your repo, then you may start seeing line ending changes again.
Fortunately, there’s a better solution: creating a .gitattributes
file at the root of your repo to settle things once and for all. Git uses this config to apply certain attributes to your files whenever you check out or commit them. One popular use case of .gitattributes
is to normalize line endings in a project. With this config-based approach, you can ensure that your line endings remain consistent in your codebase regardless of what operating systems or local Git settings your developers use since this file takes priority. You can learn more about the supported .gitattributes
options in the official Git docs.
A Simple .gitattributes
Config
The following .gitattributes
config normalizes line endings to LF
for all text files checked into your repo while leaving local line endings untouched in the working tree:
* text=auto
Add the file to the root of your workspace, commit it, and push it to your repo.
Let’s also understand how it works.
First, the wildcard selector (*
) matches all files that aren’t gitignored. These files become candidates for end-of-line normalization, subject to any attributes you’ve specified. In this case, we’re using the text
attribute, which normalizes all line endings to LF
when checking files into your repo. However, it does not modify line endings in your working tree. This is essentially the same as setting core.autocrlf=input
in your Git settings.
More specifically, the text=auto
option tells Git to only normalize line endings to LF
for text files while leaving binary files (images, fonts, etc.) untouched. This distinction is important—we don’t want to corrupt binary files by modifying their line endings.
After committing the .gitattributes
file, your changes won’t take effect immediately for files checked into Git prior to the addition of .gitattributes
. To force an update, you can use the following command since Git 2.16:
git add --renormalize .
This updates all tracked files in your repo according to the rules defined in your .gitattributes
config. If previously committed text files used CRLF
in your repo and are converted to LF
during the renormalization process, those files will be staged for a commit. You can then check if any files were modified like you would normally:
git status
The only thing left to do is to commit those changes (if any) and push them to your repo. In the future, anytime a new file is checked into Git, it’ll use LF
for line endings.
Verifying Line Endings in Git for Any File
If you want to verify that the files in your repo are using the correct line endings after all of these steps, you can run the following command:
git ls-files --eol
Or only for a particular file:
git ls-files path/to/file --eol
For text files, you should see something like this:
i/lf w/crlf attr/text=auto file.txt
From left to right, those are:
i
: line endings in Git’s index (and, by extension, the repo). Should belf
for text files.w
: line endings in your working tree. May be eitherlf
orcrlf
for text files.attr
: The attribute that applies to the file. In this example, that’stext=auto
.- The file name itself.
For binary files like images, note that you’ll see -text
for both the index and working tree line endings. This means that Git correctly isolated those binary files, leaving them untouched:
i/-text w/-text attr/text=auto image.png
Git Line Endings: Working Tree vs. Index
You may see the following message when you stage files containing CRLF
line endings locally (e.g., if you’re on Windows and introduced a new file, or if you’re not on Windows and renormalized the line endings for your codebase):
warning: CRLF will be replaced by LF in <file-name>.
The file will have its original line endings in your working directory.
This is working as expected—CRLF
will be converted to LF
when you commit your changes, meaning that when you push those files to your remote, they’ll use LF
. Anyone who later pulls or checks out that code will see LF
line endings locally for those files.
But the text
attribute doesn’t change line endings for the local copies of your text files (i.e., the ones in Git’s working tree)—it only changes line endings for files in the repo. Hence the second line of the message, which notes that the text files you just renormalized may still continue to use CRLF
locally (on your file system) if that’s the line ending with which they were originally created/cloned on your system. Rest assured that text files will never use CRLF
in the remote copy of your code.
The eol
Attribute: Controlling Line Endings in Git’s Working Tree
Sometimes, you actually want files to be checked out locally on your system with CRLF
while still retaining LF
in your repo. Usually, this is for Windows-specific files that are very sensitive to line ending changes. Batch scripts are a common example since they need CRLF
line endings to run properly. It’s okay to store these files with LF
line endings in your repo, so long as they later get checked out with the correct line endings on a Windows machine. You can find a more comprehensive list of files that need CRLF
line endings in the following article: .gitattributes
Best Practices.
When we configured our local Git settings, we saw that you can achieve this desired behavior with core.autocrlf=true
. The .gitattributes
equivalent of this is using the eol
attribute, which enables LF
normalization for files checked into your repo but also allows you to control which line ending gets applied in Git’s working tree:
eol=lf
: converts toLF
on checkout.eol=crlf
: converts toCRLF
on checkout.
In the case of batch scripts, we’d use eol=crlf
:
# All files are checked into the repo with LF
* text=auto
# These files are checked out using CRLF locally
*.bat eol=crlf
In this case, batch scripts will have two non-overlapping rules applied to them additively: text=auto
and eol=crlf
.
This change won’t take effect immediately, so if you run git ls-files --eol
after updating your .gitattributes
file, you might still see LF
line endings in the working tree. To update existing line endings in your working tree so they respect the eol
attribute, you’ll need to run the following set of commands per this StackOverflow answer:
git rm --cached -r .
git reset --hard
You’ll notice that this command differs from git add --renormalize .
, which we previously used to update line endings in the local repo. Now, we’re updating line endings in the working tree to reflect our eol
preferences. If you now you run git ls-files --eol
, you should see i/lf w/crlf
for any files matching the specified pattern.
One final note: In the recommended .gitattributes
file, we used * text=auto
to mark all text files for end-of-line normalization to LF
once they’re staged in Git’s index. We could’ve also done * text=auto eol=lf
, although these two are not identical. Like I mentioned before, if you only use * text=auto
, you may still see some CRLF
line endings locally in your working tree; this is okay and is working as expected. If you don’t want this, you can enforce * text=auto eol=lf
instead. However, this is usually not necessary because the main concern is about what line endings make it into the index and your repo.
Summary: Git Config vs. .gitattributes
There are some similarities between Git’s local settings and the Git attributes we looked at. The table below lists each Git setting, its corresponding .gitattributes
rule, and the line endings for text files in the index and working tree:
Bonus: Create an .editorconfig
File
A .gitattributes
file is technically all that you need to enforce the line endings in the remote copy of your code. However, as we just saw, you may still see CRLF
line endings on Windows locally because .gitattributes
doesn’t tell Git to change the working copies of your files.
Again, this doesn’t mean that Git’s normalization process isn’t working; it’s just the expected behavior. However, this can get annoying if you’re also linting your code with ESLint and Prettier, in which case they’ll constantly throw errors and tell you to delete those extra CR
s:
Fortunately, we can take things a step further with an .editorconfig
file; this is an editor-agnostic project that aims to create a standardized format for customizing the behavior of any given text editor. Lots of text editors (including VS Code) support and automatically read this file if it’s present. You can put something like this in the root of your workspace:
root = true
[*]
end_of_line = lf
In addition to a bunch of other settings, you can specify the line ending that should be used for any new files created through this text editor. That way, if you’re on Windows using VS Code and you create a new file, you’ll always see line endings as LF
in your working tree. Linters are happy, and so is everyone on your team!
Summary
That was a lot to take in, but hopefully you now have a better understanding of the whole CRLF vs. LF debate and why this causes so many problems for teams that use a mixture of Windows and other operating systems. Whereas Windows follows the original convention of a carriage return plus a line feed (CRLF
) for line endings, operating systems like Linux and Mac use only the line feed (LF
) character. The history of these two control characters dates back to the era of the typewriter. While this tends to cause problems with software like Git, you can specify settings at the repo level with a .gitattributes
file to normalize your line endings regardless of what operating systems your developers are using. You can also optionally add an .editorconfig
file to ensure that new files are always created with LF
line endings, even on Windows.
Attributions
Social media preview: Photo by Katrin Hauf (Unsplash).
Git tries to help translate line endings between operating systems with different standards. This gets sooo frustrating. Here’s what I always want:
On Windows:
git config --global core.autocrlf input
This says, “If I commit a file with the wrong line endings, fix it before other people notice.” Otherwise, leave it alone.
On Linux, Mac, etc:
git config --global core.autocrlf false
This says, “Don’t screw with the line endings.”
Nowhere:
git config --global core.autocrlf true
This says, “Screw with the line endings. Make them all include carriage return on my filesystem, but not have carriage return when I push to the shared repository.” This is not necessary.
Windows and Linux on the same files:
This happens when you’re running Linux in a docker container and mounting files that are stored on Windows. Generally, stick with the Windows strategy of core.autocrlf=input
, unless you have .bat
or .cmd
(Windows executables) in your repository.
The VS Code docs have tips for this case. They suggest setting up the repository with a .gitattributes
file that says “mostly use LF as line endings, but .bat
and .cmd
files need CR+LF”:
* text=auto eol=lf *.{cmd,[cC][mM][dD]} text eol=crlf *.{bat,[bB][aA][tT]} text eol=crlf
If VSCode insists on putting CR (pictured as ^M in the git diff) in a file, then open the file and check the lower right-hand corner, in the status bar. Does it say “CRLF”? Click that and choose “LF” instead. Do things right, VSCode 😠
Troubleshooting
When git is surprising you:
Check for overrides
Within a repository, the .gitattributes
file can override the autocrlf
behavior for all files or sets of files. Watch out for the text
and eol
attributes. It is incredibly complicated.
Check your settings
To find out which one is in effect for new clones:git config --global --get core.autocrlf
Or in one repository of interest:git config --local --get core.autocrlf
Why is it set that way? Find out:git config --list --show-origin
This shows all the places the settings are set. Including duplicates — it’s OK for there to be multiple entries for one setting.
Why does this even exist?
Historical reasons, of course! (If you have a Ruby Tapas subscription, there’s a great little history lesson on this.)
Back in the day, many Windows programs expected files to have line endings marked with CR+LF characters (carriage return + line feed, or rn
). These days, these programs work fine with either CR+LF or with LF alone. Meanwhile, Linux/Mac programs expect LF alone.
Use LF alone! There’s no reason to include the CR characters, even if you’re working on Windows.
One danger: new files created in programs like Notepad get CR+LF. Those files look like they have r
on every line when viewed in Linux/Mac programs or (in code) read into strings and split on n
.
That’s why, on Windows, it makes sense to ask git to change line endings from CR+LF to LF on files that it saves. core.autocrlf=input
says, screw with the line endings only in one direction. Don’t add CR, but do take it away before other people see it.
Postscript
I love ternary booleans like this: true, false, input. Hilarious! This illustrates: don’t use booleans in your interfaces. Use enums instead. Names are useful. autocrlf=ScrewWithLineEndings|GoAway|HideMyCRs
Символы конца строки EOL для текстовых файлов различаются в зависимости от операционной системы. Linux использует перевод строки LF, Windows использует возврат каретки + перевод строки CRLF. Если несколько разработчиков работают над одним проектом на GitHub под разными операционными системами — бардак практически гарантирован.
Главное, что нужно помнить — в репозитории все текстовые файлы должны быть с окончаниями LF.
Настройки EOL для Git
Настройка core.eol
имеет значение по умолчанию native
, другие возможные значения — это lf
и crlf
. Git использует значение этой настройки, когда записывает файлы в рабочую директорию при выполнении таких команд, как git checkout
или git clone
. Имеет смысл, только если core.autocrlf
равно true
.
Настройка core.autocrlf
имеет значение по умолчанию false
, другие возможные значения — это true
и input
. Настройка определяет, будет ли Git выполнять какие-либо преобразования EOL при записи/чтении в/из репозитория. Значение по умолчанию опасно, потому что может привести к записи в репозиторий CRLF файлов.
core.autocrlf=false
— ничего не делать при записи в репозиторий, ничего не делать при чтении из репозиторияcore.autocrlf=input
— при записи в репозиторий заменять CRLF на LF, при чтении из репозитория ничего не делатьcore.autocrlf=true
— при записи в репозиторий заменять CRLF на LF, при чтении из репозитория заменять LF наcore.eol
Значение input
подходит при работе под Linux:
$ git config --local core.eol native $ git config --local core.autocrlf input
Значение true
подходит при работе под Windows:
$ git config --local core.eol native $ git config --local core.autocrlf true
При выполнении этих команд будет создан файл .git/config
в директории проекта:
[core] eol = native autocrlf = input
[core] eol = native autocrlf = true
Можно записать эти значения в глобальный файл конфигурации Git ~/.gitconfig
, если заменить --local
на --global
.
Все настройки Git
Поскольку мы тут работаем с настройками Git, есть смысл упомянуть, какие они бывают и как их посмотреть.
- Системная конфигурация Git управляет настройками для всех пользователей и всех репозиториев на компьютере.
- Глобальная конфигурация Git управляет настройками текущего вошедшего пользователя и всех его репозиториев.
- Локальная конфигурация Git управляет настройками для отдельно взятого репозитория.
Эти три файла конфигурации выполняются в каскадном порядке — сначала системный, затем глобальный, и наконец, локальный. Это означает, что локальная конфигурация Git всегда будет перезаписывать настройки, установленные в глобальной или системной конфигурации.
$ git config --list $ git config --list --system $ git config --list --global $ git config --list --local
Если не указать, какую конфигурацию надо показать (первая команда) — будут показаны все три конфигурации, объединенные в вывод консоли. Чтобы посмотреть настройки вместе с именем файла конфигурации, можно использовать ключ show-origin
.
$ git config --list --show-origin file:C:/Program Files/Git/etc/gitconfig http.sslcainfo=C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt file:C:/Program Files/Git/etc/gitconfig http.sslbackend=openssl file:C:/Program Files/Git/etc/gitconfig diff.astextplain.textconv=astextplain .......... file:C:/Users/Evgeniy/.gitconfig user.name=Evgeniy Tokmakov file:C:/Users/Evgeniy/.gitconfig user.email=............... file:C:/Users/Evgeniy/.gitconfig core.autocrlf=false .......... file:.git/config core.repositoryformatversion=0 file:.git/config core.filemode=false file:.git/config core.bare=false ..........
$ git config --list --show-origin | grep autocrlf file:C:/Program Files/Git/etc/gitconfig core.autocrlf=true file:C:/Users/Evgeniy/.gitconfig core.autocrlf=false file:.git/config core.autocrlf=true
Небольшой эксперимент
У меня операционная система Windows. Создаем директорию repo-eol-example
, внутри нее — текстовой файл file.txt
. Добавим в файл пару строк и убедимся, что окончания строк — CRLF.
Переходим в директорию проекта, выполняем три команды
$ git init $ git config --local core.eol native $ git config --local core.autocrlf true
Добавляем наш файл в индекс и фиксируем изменения
$ git add file.txt $ git commit -m "add file.txt"
Добавляем в наш файл еще строку, чтобы он изменился
И восстановим его из репозитория в изначальном виде
$ git checkout -- file.txt
Что произошло? При добавлении файла в репозиторий (commit) символы CRLF были заменены на LF. При извлечении файла в рабочую директорию (checkout) — символы LF были заменены на CRLF.
Давайте убедимся в том, что в репозитории у нас символы LF. Для этого изменим настройку Git, чтобы вообще никаких замен не было. Добавим в файл строку, а потом восстановим из репозитория в изначальном виде.
$ git config --local core.autocrlf false $ git checkout -- file.txt
Что произошло? При извлечении файла в рабочую директорию — символы EOL остались без изменений, как они сохранены в репозитории.
Предупреждения от Git
Когда случается нештатная ситуация — Git предупреждает об этом. Например, если мы установили следующие настройки для Git:
$ git config --local core.eol native $ git config --local core.autocrlf input
И пытаемся записать CRLF файл в репозиторий — Git предупреждает, что символы CRLF будут заменены на LF (при записи в репозиторий). Тут ситуация явно нештатная — вроде бы настройки соответствуют Linux, но при этом в рабочей директории откуда-то взялся CRLF файл, а этого быть не должно.
$ git add other.txt warning: CRLF will be replaced by LF in other.txt. The file will have its original line endings in your working directory
При извлечении такого файла из репозитория в рабочую директорию — никаких преобразований EOL не будет, потому что input
работает только при записи в репозиторий. И мы получим LF окончания строк в этом файле — так, как и должно быть в Linux.
Еще одна нештатная ситуация — мы установили следующие настройки для Git:
$ git config --local core.eol native $ git config --local core.autocrlf auto
И пытаемся записать LF файл в репозиторий — Git предупреждает, что символы LF будут заменены на CRLF (при чтении из репозитория). Тут ситуация явно нештатная — вроде бы настройки соответствуют Windows, но при этом в рабочей директории откуда-то взялся LF файл, а этого быть не должно.
$ git add another.txt warning: LF will be replaced by CRLF in another.txt. The file will have its original line endings in your working directory
При извлечении такого файла из репозитория в рабочую директорию — будет выполнена замена LF на CRLF. И мы получим CRLF окончания строк в этом файле — так, как и должно быть в Windows.
Тут важно то, что как в первой, так и во второй ситуации — файл будет сохранен в репозитории с LF окончаниями строк, как и должно быть.
Настройка core.safecrlf
Как Git узнает, что файл является текстовым? У Git есть внутренний метод эвристической проверки, является ли файл двоичным или нет. Файл считается текстовым, если он не является двоичным. Git иногда может ошибаться — и по этой причине существует настройка core.safecrlf
.
Эту настройку нужно установить в значение true
. Тогда при подготовке к замене CRLF на LF — Git проверит, что сможет успешно отменить операцию. Это защита от того, чтобы выполнить замену в файле, который не является текстовым — и, тем самым, безнадежно его испортить.
Лично мне удобно везде использовать LF, хотя у меня основная система Windows — поэтому установил себе настройки, чтобы вообще не заменять EOL.
$ git config --global core.eol native $ git config --global core.autocrlf false
Современные IDE способны работать под Windows с EOL как в Linux, так что необходимости в заменах просто нет. В настройках VS Code у меня установлено значение LF для EOL.
{ .......... "files.eol": "n", // символ конца строки как в linux .......... }
Чтобы следить за символами конца строки — можно установить расширение «Render Line Endings», которое показывает символы LF и CRLF.
{ .......... "editor.renderWhitespace": "all", // показывать символы пробелов "files.eol": "n", // символ конца строки как в linux .......... "code-eol.newlineCharacter": "↓", // символ LF "code-eol.crlfCharacter" : "←↓", // символы CRLF // подсвечивать как ошибку EOL в файле, если не совпадает с настройкой files.eol "code-eol.highlightNonDefault": true, }
Когда в проект случайно попадёт файл с CRLF символами конца строки — эти символы будут подсвечены красным цветом (вообще, цветом errorForeground
темы).
Но такая подсветка будет всего секунду, потому что у меня еще настроено автосохранение открытых файлов — но этой секунды достаточно, чтобы увидеть проблему и отреагировать.
{ .......... "editor.renderWhitespace": "all", // показывать символы пробелов "files.eol": "n", // символ конца строки как в linux "files.autoSave": "afterDelay", // автоматическое сохранение файла "files.autoSaveDelay": 1000, // задержка перед сохранением файла .......... "code-eol.newlineCharacter": "↓", // символ LF "code-eol.crlfCharacter" : "←↓", // символы CRLF // подсвечивать как ошибку EOL в файле, если не совпадает с настройкой files.eol "code-eol.highlightNonDefault": true, }
Чтобы настройки VS Code всегда были правильными, можно создать файл .editorconfig
в корне проекта и установить расширение «EditorConfig for VS Code». Расширение читает файл .editorconfig
и устанавливает правильные настройки VS Code.
# эта настройка должна быть в самом начале; если установлена в true, # парсер не будет искать другие конфиги родительских директориях root = true # правила для текстовых файлов [*.{txt,md,html,css,scss,js,jsx,ts,tsx,py,php,json,xml,sh}] # кодировка файлов charset = utf-8 # концы строк как в linux end_of_line = lf # пустая строка в конце файла insert_final_newline = true # удалять пробелы в конце строк trim_trailing_whitespace = true # заменять табуляцию на пробелы indent_style = space # табуляция заменяется 4 пробелами indent_size = 4
{ .......... "files.encoding": "utf8", // кодировка файлов "files.eol": "n", // концы строк как в linux "files.insertFinalNewline": true, // пустая строка в конце файла "files.trimTrailingWhitespace": true, // удалять пробелы в конце строк "editor.insertSpaces": true, // заменять табуляцию на пробелы "editor.tabSize": 4, // табуляция заменяется 4 пробелами .......... }
Еще лучше — разместить файл .editorconfig
в корне директории, которая содержит все проекты, над которыми идет работа. Тогда при открытии любого проекта VS Code будет подхватывать этот файл и его не надо будет создавать отдельно для каждого проекта.
Работа в команде
В настоящее время настройку core.autocrlf
использовать нежелательно. На смену ей пришел файл .gitattributes
в корне рабочей директории проекта, который нужно добавить под наблюдение Git.
* text=auto
$ git add .gitattributes $ git commit -m "Add .gitattributes"
Тем самым мы говорим Git, чтобы он самостоятельно определял текстовые файлы и заменял CRLF на LF при записи в репозиторий. Это эквивалентно установке core.autocrlf=true
в файле конфигурации, но файл .gitattributes
имеет приоритет над файлом конфигурации.
Таким образом, у всех разработчиков, которые работают над одним проектом, будет одинаковое поведение Git при записи в репозиторий. А вот настройка core.eol
у каждого разработчика будет своя, из файла конфигурации на компьютере. И извлекать файлы в рабочую директорию разработчик может с любыми окончаниями — LF или CRLF.
Если файла .gitattributes
нет — Git по старинке будет использовать core.autocrlf
из файла конфигурации для замены символов EOL.
Если случилась беда
Все-таки это произошло — в репозиторий попали CRLF файлы. Проверить это можно с помощью команды
$ git ls-files --eol i/crlf w/crlf attr/ file-crlf-one.txt i/crlf w/crlf attr/ file-crlf-two.txt i/lf w/lf attr/ file-lf-one.txt i/lf w/lf attr/ file-lf-two.txt
Первая колонка — окончания строк в репозитории, вторая колонка — окончания строк в рабочей директории. Такая команда может выдать несколько тысяч строк, а нам интересно — есть ли вообще в репозитории такие файлы, так что нужен фильтр.
$ git ls-files --eol | grep "i/crlf" i/crlf w/crlf attr/ file-crlf-one.txt i/crlf w/crlf attr/ file-crlf-two.txt
Давайте наведем порядок — создадим файл .gitattributes
, добавим его в репозиторий, выполним команду нормализации EOL в репозитории.
* text=auto
$ git add .gitattributes $ git commit -m "Add .gitattributes" [master 347c98e] Add .gitattributes 1 file changed, 1 insertion(+) create mode 100644 .gitattributes
$ git add --renormalize .
$ git status On branch master Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: file-crlf-one.txt modified: file-crlf-two.txt $ git commit -m "Normalize eol" [master e54c4b7] Normalize eol 2 files changed, 4 insertions(+), 4 deletions(-)
Смотрим, что у нас теперь в репозитории — все хорошо, все окончания строк сейчас LF:
$ git ls-files --eol i/none w/none attr/text=auto .gitattributes i/lf w/crlf attr/text=auto file-crlf-one.txt i/lf w/crlf attr/text=auto file-crlf-two.txt i/lf w/lf attr/text=auto file-lf-one.txt i/lf w/lf attr/text=auto file-lf-two.txt
Теперь надо заменить файлы в рабочей директории, для этого выполняем две команды:
$ git rm --cached -r . rm '.gitattributes' rm 'file-crlf-one.txt' rm 'file-crlf-two.txt' rm 'file-lf-one.txt' rm 'file-lf-two.txt' $ git reset --hard HEAD is now at e54c4b7 Normalize eol
Смотрим, что у нас теперь в рабочей директории (у меня Windows и core.eol=native
):
$ git ls-files --eol i/none w/none attr/text=auto .gitattributes i/lf w/crlf attr/text=auto file-crlf-one.txt i/lf w/crlf attr/text=auto file-crlf-two.txt i/lf w/crlf attr/text=auto file-lf-one.txt i/lf w/crlf attr/text=auto file-lf-two.txt
Дополнительно
- Mind the End of Your Line
- Normalizing Line Endings in Git
Поиск:
Git • Linux • Web-разработка • Windows • Конфигурация • Настройка • EOL • CRLF • LF • Файл • IDE
Каталог оборудования
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Производители
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Функциональные группы
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Хотя RegexpMultiline
может это сделать, пользователи не могут его обнаружить:
<module name="RegexpMultiline">
<property name="format" value="rn"/>
<property name="message" value="Do not use Windows line endings"/>
</module>
Все 17 Комментарий
Мы всегда стараемся выполнять общие проверки, поэтому проверка должна обеспечивать соблюдение определяемого пользователем символа LineEnd.
Мы рассмотрим эту идею, как только закончится период моратория на новые проверки. Если вы готовы ее реализовать, добро пожаловать в наш экспериментальный проект проверки — https://github.com/sevntu-checkstyle/sevntu.checkstyle.
Эта проверка совершенно бесполезна, так как хорошая конфигурация VCS заставит локальную рабочую копию иметь Windows-подобный (CRLF) стиль EOL на клиентах Windows и стиль Linux EOL (LF) на клиентах Linux. Таким образом, правильно настроенные клиенты должны иметь несколько ошибок для стиля проверки.
Это ограничение должно быть принудительно выполнено внутри вашей VCS. Для Stash вы можете использовать https://github.com/pbaranchikov/stash-eol-check , для gitolite-admin вы должны выполнить некоторые сценарии BASH и так далее.
@andrewgaul , какая у вас система VCS? GIT управляет этим, как описано пбаранчиков.
@pbaranchikov Я согласен с тем, что правильно настроенные клиенты git могут предупреждать о завершении строк с помощью различных хуков, хотя у администратора проекта, использующего GitHub, нет реального способа принудительно применить это ни на сервере, ни на клиентах наших участников. Мы включили ранее упомянутую RegexpMultiline
в jclouds / jclouds # 717.
Я не думаю, что это правило — хорошая идея. Используя git, вы не можете контролировать, какой стиль EOL используют ваши клиенты. Мы можем закончить с правилом, которое передает CI, но не работает для людей, использующих Windows.
В моей работе мы используем git без принудительного завершения строк, и мы применяем их с помощью checkstyle в исходных кодах java.
Тестовые входы имеют разные окончания строк, чтобы проверить, что логика на основе файлов работает должным образом с любой комбинацией.
Полностью согласен с @andrewgaul — такая проверка может быть очень ценной imho. Несогласованные окончания строк, безусловно, могут вызвать проблемы, особенно для файлов ресурсов (файлы конфигурации, файлы csv и т. Д.).
Предположим на секунду, что используется _git_. Клиенты могут решить, как файлы конвертировать, используя функцию core.autocrlf . Некоторые из них могли отключить его, что привело к тому, что клиенты Windows выдвинули окончание строки CRLF
, что не должно происходить в соответствии со стандартами git. Если ваш код предполагает окончание строки LF
, файл CSV может стать нечитаемым: cry:
@mkordas На самом деле, вы можете очень хорошо настроить окончания строк, используя файл .gitattributes
. См., Например, # 1045. Вы также можете указать окончания строк для каждого типа файла, и git также может попытаться автоматически обнаружить и выполнить преобразование EOL. К сожалению, реализация jgit, которую использует eclipse, просто игнорирует файл .gitattributes
: cry:
Подводя итог: проверка, которая проверяет, что фиксируются только текстовые / кодовые файлы с окончанием строки LF
, имеет для меня смысл, поскольку обеспечивает согласованность файлов кода и, таким образом, помогает избежать ошибок, которые действительно трудно отследить. .
@msteiger , в целом я с тобой согласен. У меня тоже были проблемы с новыми строками в моих проектах. У меня всего пара проблем:
- Предположим, у нас есть разработчики, использующие Windows и CI в Linux, и проверка настроена на принудительное выполнение
LF
— Checkstyle не сработает на машинах для разработки. - Предположим, мы добавили свойство, например, «сопоставить системный разделитель строк», чтобы применить
LF
в Unix иCRLF
в Windows — Checkstyle не сработает для всех разработчиков, у которыхautocrlf
отключено. - Предположим, мы используем файл
.gitattributes
— тогда каждая запись из этого файла требует подавления в конфигурации Checkstyle … - Предположим, что разработчик имеет все файлы как
CRLF
в Windows, но временно хочет выполнить сценарий bash с помощью Cygwin, поэтому требуетсяdos2unix
. Checkstyle не сработает, даже если Git впоследствии справится с этим случаем.
Я просто не уверен, что такая хрупкая проверка, но, возможно, достаточно хорошего описания (и предупреждения) в документации.
+1 @andrewgaul et al. ИМХО, это, безусловно, хорошая идея, и было бы здорово иметь встроенную программу! Обратите внимание: «(? S: r n. *)» Лучше, потому что он будет соответствовать только первой неправильной новой строке, поэтому меньше спамовых журналов (и, возможно, даже быстрее?). См. Также мою аннотацию (сообщение в блоге) об этом на http://blog2.vorburger.ch/2015/06/eol.html.
Существует множество VCS, поэтому, возможно, некоторые VCS не справляются с этим правильно.
Кто-нибудь может сделать эту проверку, если она требуется, все могут внести свой вклад.
Интересная проблема возникла в https://github.com/checkstyle/checkstyle/pull/3557 :
Если вы работаете в среде с несколькими ОС и используете core.autocrlf, то, например, при работе в Windows git преобразует LF в CR + LF на диске, верно? Таким образом, такая проверка регулярного выражения не удалась бы там, даже если она верна в репо и передала бы * NIX .. (На самом деле БЫЛА неудачна в собственной сборке Checkstyle, некоторые из которых работают в Windows; см. Https://github.com / checkstyle / checkstyle / pull / 3557.)
Итак, если бы такая новая проверка была разработана, она должна была бы иметь какой-то режим вроде «этот проект является частью репозитория с использованием core.autocrlf, и поэтому, если он работает в ОС Windows, пропустите эту проверку» — кто-нибудь думает?
Или, возможно, использование core.autocrlf и наличие машин для сборки Windows — это просто плохая идея? На самом деле, похоже, что некоторые люди действительно строят на Windows (включая сам CS). Или просто удалите core.autocrlf из собственного репо CS?
возможно, использование core.autocrlf и наличие машин для сборки Windows — это просто плохая идея?
Компании и частные пользователи — это те, кто действительно придерживается Windows. Большинство предприятий могут не использовать Windows на машинах для производственной сборки, но им все равно необходимо иметь возможность тестировать их локально.
Отказ от окон любого проекта приведет к уменьшению вашей пользовательской базы, что обычно является плохой идеей.
репо с использованием core.autocrlf
Вам также необходимо принять во внимание .gitattributes
который может заставить разные окончания строк для определенных файлов.
репо с использованием core.autocrlf, поэтому, если вы работаете в ОС Windows, пропустите эту проверку
Ничто не мешает пользователям Linux включить опцию. Возможно, репо имеет устаревший код и был разработан и используется исключительно Windows. Вам придется пропустить всех, кто использует autocrlf
.
если бы такая новая проверка была разработана, она должна была бы иметь какой-то режим вроде «этот проект является частью репо, использующего core.autocrlf, и поэтому, если он работает в ОС Windows, пропустите эту проверку»
Почему возникает проблема проверять локальную копию (домен Checkstyle), а не проверять копию сервера (домен git)? Checkstyle также не может быть установлен для проверки каждого файла в репозитории.
Для меня это больше похоже на проблему, не проверяющую коммиты, поступающие в репозиторий, чем на файлы на локальном жестком диске.
Я определенно согласен с @vorburger и @msteiger. Такая проверка была бы ценной.
Хотя могут существовать некоторые проекты, в которых нужны окончания строк CRLF — я считаю, что более 9/10 должны придерживаться LF и core.autocrlf = input
(на каждой платформе). Я видел слишком много случайных окончаний CRLF в своей жизни.
@mkordas Обратите внимание, что все 4 поднятых вами autocrlf = input
.
И все, кто все еще не согласен, пожалуйста, помните, что этот модуль, как и любой другой, должен быть включен, когда проект будет нуждаться в этом — никого не заставляют делать это .
@gdemecki Сейчас прошло больше года после моего комментария сверху, и у меня было так много конфликтов с неправильными окончаниями строк в проектах, что я определенно вижу необходимость в такой проверке. Нам просто нужно правильно задокументировать это, в идеале с некоторыми подсказками, как настроить Git для совместимости с каждым параметром.
Я могу сделать эту проверку, как я уже упоминал ранее, я поставлю утвержденную этикетку, когда мы выйдем из периода моратория.
Обеспечение того же конца строки во всем репо может быть проблематичным, если пользователи хранят скрипты разных ОС в одном репо, пример проблем, когда концы строки Linux появляются в скриптах Windows https://serverfault.com/a/429598
Но проверьте, чтобы концы строк согласовывались в некотором наборе файлов, и это полезно, и пользователю необходимо правильно это настроить.
Эта проверка совершенно бесполезна, так как хорошая конфигурация VCS заставит локальную рабочую копию иметь Windows-подобный (CRLF) стиль EOL на клиентах Windows и стиль Linux EOL (LF) на клиентах Linux. Таким образом, правильно настроенные клиенты должны иметь несколько ошибок для стиля проверки.
Идея о том, чтобы ваша система управления версиями была убеждена в окончании строк, довольно нова и противоречива.
Я бы, конечно, не сказал, что каждая установка, которая не выполняет нормализацию окончания строки при оформлении заказа / регистрации, неверна.
Была ли эта страница полезной?
0 / 5 — 0 рейтинги
NAME
gitattributes — Defining attributes per path
SYNOPSIS
$GIT_DIR/info/attributes, .gitattributes
DESCRIPTION
A gitattributes
file is a simple text file that gives
attributes
to pathnames.
Each line in gitattributes
file is of form:
That is, a pattern followed by an attributes list,
separated by whitespaces. Leading and trailing whitespaces are
ignored. Lines that begin with # are ignored. Patterns
that begin with a double quote are quoted in C style.
When the pattern matches the path in question, the attributes
listed on the line are given to the path.
Each attribute can be in one of these states for a given path:
- Set
-
The path has the attribute with special value «true»;
this is specified by listing only the name of the
attribute in the attribute list. - Unset
-
The path has the attribute with special value «false»;
this is specified by listing the name of the attribute
prefixed with a dash-
in the attribute list. - Set to a value
-
The path has the attribute with specified string value;
this is specified by listing the name of the attribute
followed by an equal sign=
and its value in the
attribute list. - Unspecified
-
No pattern matches the path, and nothing says if
the path has or does not have the attribute, the
attribute for the path is said to be Unspecified.
When more than one pattern matches the path, a later line
overrides an earlier line. This overriding is done per
attribute.
The rules by which the pattern matches paths are the same as in
.gitignore
files (see gitignore[5]), with a few exceptions:
-
negative patterns are forbidden
-
patterns that match a directory do not recursively match paths
inside that directory (so using the trailing-slashpath/
syntax is
pointless in an attributes file; usepath/**
instead)
When deciding what attributes are assigned to a path, Git
consults $GIT_DIR/info/attributes
file (which has the highest
precedence), .gitattributes
file in the same directory as the
path in question, and its parent directories up to the toplevel of the
work tree (the further the directory that contains .gitattributes
is from the path in question, the lower its precedence). Finally
global and system-wide files are considered (they have the lowest
precedence).
When the .gitattributes
file is missing from the work tree, the
path in the index is used as a fall-back. During checkout process,
.gitattributes
in the index is used and then the file in the
working tree is used as a fall-back.
If you wish to affect only a single repository (i.e., to assign
attributes to files that are particular to
one user’s workflow for that repository), then
attributes should be placed in the $GIT_DIR/info/attributes
file.
Attributes which should be version-controlled and distributed to other
repositories (i.e., attributes of interest to all users) should go into
.gitattributes
files. Attributes that should affect all repositories
for a single user should be placed in a file specified by the
core.attributesFile
configuration option (see git-config[1]).
Its default value is $XDG_CONFIG_HOME/git/attributes. If $XDG_CONFIG_HOME
is either not set or empty, $HOME/.config/git/attributes is used instead.
Attributes for all users on a system should be placed in the
$(prefix)/etc/gitattributes
file.
Sometimes you would need to override a setting of an attribute
for a path to Unspecified
state. This can be done by listing
the name of the attribute prefixed with an exclamation point !
.
EFFECTS
Certain operations by Git can be influenced by assigning
particular attributes to a path. Currently, the following
operations are attributes-aware.
Checking-out and checking-in
These attributes affect how the contents stored in the
repository are copied to the working tree files when commands
such as git switch, git checkout and git merge run.
They also affect how
Git stores the contents you prepare in the working tree in the
repository upon git add and git commit.
text
This attribute marks the path as a text file, which enables end-of-line
conversion: When a matching file is added to the index, the file’s line
endings are normalized to LF in the index. Conversely, when the file is
copied from the index to the working directory, its line endings may be
converted from LF to CRLF depending on the eol
attribute, the Git
config, and the platform (see explanation of eol
below).
- Set
-
Setting the
text
attribute on a path enables end-of-line
conversion on checkin and checkout as described above. Line endings
are normalized to LF in the index every time the file is checked in,
even if the file was previously added to Git with CRLF line endings. - Unset
-
Unsetting the
text
attribute on a path tells Git not to
attempt any end-of-line conversion upon checkin or checkout. - Set to string value «auto»
-
When
text
is set to «auto», Git decides by itself whether the file
is text or binary. If it is text and the file was not already in
Git with CRLF endings, line endings are converted on checkin and
checkout as described above. Otherwise, no conversion is done on
checkin or checkout. - Unspecified
-
If the
text
attribute is unspecified, Git uses the
core.autocrlf
configuration variable to determine if the
file should be converted.
Any other value causes Git to act as if text
has been left
unspecified.
eol
This attribute marks a path to use a specific line-ending style in the
working tree when it is checked out. It has effect only if text
or
text=auto
is set (see above), but specifying eol
automatically sets
text
if text
was left unspecified.
- Set to string value «crlf»
-
This setting converts the file’s line endings in the working
directory to CRLF when the file is checked out. - Set to string value «lf»
-
This setting uses the same line endings in the working directory as
in the index when the file is checked out. - Unspecified
-
If the
eol
attribute is unspecified for a file, its line endings
in the working directory are determined by thecore.autocrlf
or
core.eol
configuration variable (see the definitions of those
options in git-config[1]). Iftext
is set but neither of
those variables is, the default iseol=crlf
on Windows and
eol=lf
on all other platforms.
Backwards compatibility with crlf
attribute
For backwards compatibility, the crlf
attribute is interpreted as
follows:
crlf text -crlf -text crlf=input eol=lf
End-of-line conversion
While Git normally leaves file contents alone, it can be configured to
normalize line endings to LF in the repository and, optionally, to
convert them to CRLF when files are checked out.
If you simply want to have CRLF line endings in your working directory
regardless of the repository you are working with, you can set the
config variable «core.autocrlf» without using any attributes.
This does not force normalization of text files, but does ensure
that text files that you introduce to the repository have their line
endings normalized to LF when they are added, and that files that are
already normalized in the repository stay normalized.
If you want to ensure that text files that any contributor introduces to
the repository have their line endings normalized, you can set the
text
attribute to «auto» for all files.
The attributes allow a fine-grained control, how the line endings
are converted.
Here is an example that will make Git normalize .txt, .vcproj and .sh
files, ensure that .vcproj files have CRLF and .sh files have LF in
the working directory, and prevent .jpg files from being normalized
regardless of their content.
* text=auto *.txt text *.vcproj text eol=crlf *.sh text eol=lf *.jpg -text
Note |
When text=auto conversion is enabled in a cross-platformproject using push and pull to a central repository the text files containing CRLFs should be normalized. |
From a clean working directory:
$ echo "* text=auto" >.gitattributes $ git add --renormalize . $ git status # Show files that will be normalized $ git commit -m "Introduce end-of-line normalization"
If any files that should not be normalized show up in git status,
unset their text
attribute before running git add -u.
Conversely, text files that Git does not detect can have normalization
enabled manually.
If core.safecrlf
is set to «true» or «warn», Git verifies if
the conversion is reversible for the current setting of
core.autocrlf
. For «true», Git rejects irreversible
conversions; for «warn», Git only prints a warning but accepts
an irreversible conversion. The safety triggers to prevent such
a conversion done to the files in the work tree, but there are a
few exceptions. Even though…
-
git add itself does not touch the files in the work tree, the
next checkout would, so the safety triggers; -
git apply to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger; -
git diff itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next git add. To
catch potential problems early, safety triggers.
working-tree-encoding
Git recognizes files encoded in ASCII or one of its supersets (e.g.
UTF-8, ISO-8859-1, …) as text files. Files encoded in certain other
encodings (e.g. UTF-16) are interpreted as binary and consequently
built-in Git text processing tools (e.g. git diff) as well as most Git
web front ends do not visualize the contents of these files by default.
In these cases you can tell Git the encoding of a file in the working
directory with the working-tree-encoding
attribute. If a file with this
attribute is added to Git, then Git re-encodes the content from the
specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
content in its internal data structure (called «the index»). On checkout
the content is re-encoded back to the specified encoding.
Please note that using the working-tree-encoding
attribute may have a
number of pitfalls:
-
Alternative Git implementations (e.g. JGit or libgit2) and older Git
versions (as of March 2018) do not support theworking-tree-encoding
attribute. If you decide to use theworking-tree-encoding
attribute
in your repository, then it is strongly recommended to ensure that all
clients working with the repository support it.For example, Microsoft Visual Studio resources files (
*.rc
) or
PowerShell script files (*.ps1
) are sometimes encoded in UTF-16.
If you declare*.ps1
as files as UTF-16 and you addfoo.ps1
with
aworking-tree-encoding
enabled Git client, thenfoo.ps1
will be
stored as UTF-8 internally. A client withoutworking-tree-encoding
support will checkoutfoo.ps1
as UTF-8 encoded file. This will
typically cause trouble for the users of this file.If a Git client that does not support the
working-tree-encoding
attribute adds a new filebar.ps1
, thenbar.ps1
will be
stored «as-is» internally (in this example probably as UTF-16).
A client withworking-tree-encoding
support will interpret the
internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
That operation will fail and cause an error. -
Reencoding content to non-UTF encodings can cause errors as the
conversion might not be UTF-8 round trip safe. If you suspect your
encoding to not be round trip safe, then add it to
core.checkRoundtripEncoding
to make Git check the round trip
encoding (see git-config[1]). SHIFT-JIS (Japanese character
set) is known to have round trip issues with UTF-8 and is checked by
default. -
Reencoding content requires resources that might slow down certain
Git operations (e.g git checkout or git add).
Use the working-tree-encoding
attribute only if you cannot store a file
in UTF-8 encoding and if you want Git to be able to process the content
as text.
As an example, use the following attributes if your *.ps1 files are
UTF-16 encoded with byte order mark (BOM) and you want Git to perform
automatic line ending conversion based on your platform.
*.ps1 text working-tree-encoding=UTF-16
Use the following attributes if your *.ps1 files are UTF-16 little
endian encoded without BOM and you want Git to use Windows line endings
in the working directory (use UTF-16LE-BOM
instead of UTF-16LE
if
you want UTF-16 little endian with BOM).
Please note, it is highly recommended to
explicitly define the line endings with eol
if the working-tree-encoding
attribute is used to avoid ambiguity.
*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF
You can get a list of all available encodings on your platform with the
following command:
If you do not know the encoding of a file, then you can use the file
command to guess the encoding:
ident
When the attribute ident
is set for a path, Git replaces
$Id$
in the blob object with $Id:
, followed by the
40-character hexadecimal blob object name, followed by a dollar
sign $
upon checkout. Any byte sequence that begins with
$Id:
and ends with $
in the worktree file is replaced
with $Id$
upon check-in.
filter
A filter
attribute can be set to a string value that names a
filter driver specified in the configuration.
A filter driver consists of a clean
command and a smudge
command, either of which can be left unspecified. Upon
checkout, when the smudge
command is specified, the command is
fed the blob object from its standard input, and its standard
output is used to update the worktree file. Similarly, the
clean
command is used to convert the contents of worktree file
upon checkin. By default these commands process only a single
blob and terminate. If a long running process
filter is used
in place of clean
and/or smudge
filters, then Git can process
all blobs with a single filter command invocation for the entire
life of a single Git command, for example git add --all
. If a
long running process
filter is configured then it always takes
precedence over a configured single blob filter. See section
below for the description of the protocol used to communicate with
a process
filter.
One use of the content filtering is to massage the content into a shape
that is more convenient for the platform, filesystem, and the user to use.
For this mode of operation, the key phrase here is «more convenient» and
not «turning something unusable into usable». In other words, the intent
is that if someone unsets the filter driver definition, or does not have
the appropriate filter program, the project should still be usable.
Another use of the content filtering is to store the content that cannot
be directly used in the repository (e.g. a UUID that refers to the true
content stored outside Git, or an encrypted content) and turn it into a
usable form upon checkout (e.g. download the external content, or decrypt
the encrypted content).
These two filters behave differently, and by default, a filter is taken as
the former, massaging the contents into more convenient shape. A missing
filter driver definition in the config, or a filter driver that exits with
a non-zero status, is not an error but makes the filter a no-op passthru.
You can declare that a filter turns a content that by itself is unusable
into a usable content by setting the filter.<driver>.required configuration
variable to true
.
Note: Whenever the clean filter is changed, the repo should be renormalized:
$ git add —renormalize .
For example, in .gitattributes, you would assign the filter
attribute for paths.
Then you would define a «filter.indent.clean» and «filter.indent.smudge»
configuration in your .git/config to specify a pair of commands to
modify the contents of C programs when the source files are checked
in («clean» is run) and checked out (no change is made because the
command is «cat»).
[filter "indent"] clean = indent smudge = cat
For best results, clean
should not alter its output further if it is
run twice («clean→clean» should be equivalent to «clean»), and
multiple smudge
commands should not alter clean
‘s output
(«smudge→smudge→clean» should be equivalent to «clean»). See the
section on merging below.
The «indent» filter is well-behaved in this regard: it will not modify
input that is already correctly indented. In this case, the lack of a
smudge filter means that the clean filter must accept its own output
without modifying it.
If a filter must succeed in order to make the stored contents usable,
you can declare that the filter is required
, in the configuration:
[filter "crypt"] clean = openssl enc ... smudge = openssl enc -d ... required
Sequence «%f» on the filter command line is replaced with the name of
the file the filter is working on. A filter might use this in keyword
substitution. For example:
[filter "p4"] clean = git-p4-filter --clean %f smudge = git-p4-filter --smudge %f
Note that «%f» is the name of the path that is being worked on. Depending
on the version that is being filtered, the corresponding file on disk may
not exist, or may have different contents. So, smudge and clean commands
should not try to access the file on disk, but only act as filters on the
content provided to them on standard input.
Long Running Filter Process
If the filter command (a string value) is defined via
filter.<driver>.process
then Git can process all blobs with a
single filter invocation for the entire life of a single Git
command. This is achieved by using the long-running process protocol
(described in technical/long-running-process-protocol.txt).
When Git encounters the first file that needs to be cleaned or smudged,
it starts the filter and performs the handshake. In the handshake, the
welcome message sent by Git is «git-filter-client», only version 2 is
supported, and the supported capabilities are «clean», «smudge», and
«delay».
Afterwards Git sends a list of «key=value» pairs terminated with
a flush packet. The list will contain at least the filter command
(based on the supported capabilities) and the pathname of the file
to filter relative to the repository root. Right after the flush packet
Git sends the content split in zero or more pkt-line packets and a
flush packet to terminate content. Please note, that the filter
must not send any response before it received the content and the
final flush packet. Also note that the «value» of a «key=value» pair
can contain the «=» character whereas the key would never contain
that character.
packet: git> command=smudge packet: git> pathname=path/testfile.dat packet: git> 0000 packet: git> CONTENT packet: git> 0000
The filter is expected to respond with a list of «key=value» pairs
terminated with a flush packet. If the filter does not experience
problems then the list must contain a «success» status. Right after
these packets the filter is expected to send the content in zero
or more pkt-line packets and a flush packet at the end. Finally, a
second list of «key=value» pairs terminated with a flush packet
is expected. The filter can change the status in the second list
or keep the status as is with an empty list. Please note that the
empty list must be terminated with a flush packet regardless.
packet: git< status=success packet: git< 0000 packet: git< SMUDGED_CONTENT packet: git< 0000 packet: git< 0000 # empty list, keep "status=success" unchanged!
If the result content is empty then the filter is expected to respond
with a «success» status and a flush packet to signal the empty content.
packet: git< status=success packet: git< 0000 packet: git< 0000 # empty content! packet: git< 0000 # empty list, keep "status=success" unchanged!
In case the filter cannot or does not want to process the content,
it is expected to respond with an «error» status.
packet: git< status=error packet: git< 0000
If the filter experiences an error during processing, then it can
send the status «error» after the content was (partially or
completely) sent.
packet: git< status=success packet: git< 0000 packet: git< HALF_WRITTEN_ERRONEOUS_CONTENT packet: git< 0000 packet: git< status=error packet: git< 0000
In case the filter cannot or does not want to process the content
as well as any future content for the lifetime of the Git process,
then it is expected to respond with an «abort» status at any point
in the protocol.
packet: git< status=abort packet: git< 0000
Git neither stops nor restarts the filter process in case the
«error»/»abort» status is set. However, Git sets its exit code
according to the filter.<driver>.required
flag, mimicking the
behavior of the filter.<driver>.clean
/ filter.<driver>.smudge
mechanism.
If the filter dies during the communication or does not adhere to
the protocol then Git will stop the filter process and restart it
with the next file that needs to be processed. Depending on the
filter.<driver>.required
flag Git will interpret that as error.
Delay
If the filter supports the «delay» capability, then Git can send the
flag «can-delay» after the filter command and pathname. This flag
denotes that the filter can delay filtering the current blob (e.g. to
compensate network latencies) by responding with no content but with
the status «delayed» and a flush packet.
packet: git> command=smudge packet: git> pathname=path/testfile.dat packet: git> can-delay=1 packet: git> 0000 packet: git> CONTENT packet: git> 0000 packet: git< status=delayed packet: git< 0000
If the filter supports the «delay» capability then it must support the
«list_available_blobs» command. If Git sends this command, then the
filter is expected to return a list of pathnames representing blobs
that have been delayed earlier and are now available.
The list must be terminated with a flush packet followed
by a «success» status that is also terminated with a flush packet. If
no blobs for the delayed paths are available, yet, then the filter is
expected to block the response until at least one blob becomes
available. The filter can tell Git that it has no more delayed blobs
by sending an empty list. As soon as the filter responds with an empty
list, Git stops asking. All blobs that Git has not received at this
point are considered missing and will result in an error.
packet: git> command=list_available_blobs packet: git> 0000 packet: git< pathname=path/testfile.dat packet: git< pathname=path/otherfile.dat packet: git< 0000 packet: git< status=success packet: git< 0000
After Git received the pathnames, it will request the corresponding
blobs again. These requests contain a pathname and an empty content
section. The filter is expected to respond with the smudged content
in the usual way as explained above.
packet: git> command=smudge packet: git> pathname=path/testfile.dat packet: git> 0000 packet: git> 0000 # empty content! packet: git< status=success packet: git< 0000 packet: git< SMUDGED_CONTENT packet: git< 0000 packet: git< 0000 # empty list, keep "status=success" unchanged!
Example
A long running filter demo implementation can be found in
contrib/long-running-filter/example.pl
located in the Git
core repository. If you develop your own long running filter
process then the GIT_TRACE_PACKET
environment variables can be
very helpful for debugging (see git[1]).
Please note that you cannot use an existing filter.<driver>.clean
or filter.<driver>.smudge
command with filter.<driver>.process
because the former two use a different inter process communication
protocol than the latter one.
Interaction between checkin/checkout attributes
In the check-in codepath, the worktree file is first converted
with filter
driver (if specified and corresponding driver
defined), then the result is processed with ident
(if
specified), and then finally with text
(again, if specified
and applicable).
In the check-out codepath, the blob content is first converted
with text
, and then ident
and fed to filter
.
Merging branches with differing checkin/checkout attributes
If you have added attributes to a file that cause the canonical
repository format for that file to change, such as adding a
clean/smudge filter or text/eol/ident attributes, merging anything
where the attribute is not in place would normally cause merge
conflicts.
To prevent these unnecessary merge conflicts, Git can be told to run a
virtual check-out and check-in of all three stages of a file when
resolving a three-way merge by setting the merge.renormalize
configuration variable. This prevents changes caused by check-in
conversion from causing spurious merge conflicts when a converted file
is merged with an unconverted file.
As long as a «smudge→clean» results in the same output as a «clean»
even on files that are already smudged, this strategy will
automatically resolve all filter-related conflicts. Filters that do
not act in this way may cause additional merge conflicts that must be
resolved manually.
Generating diff text
diff
The attribute diff
affects how Git generates diffs for particular
files. It can tell Git whether to generate a textual patch for the path
or to treat the path as a binary file. It can also affect what line is
shown on the hunk header @@ -k,l +n,m @@
line, tell Git to use an
external command to generate the diff, or ask Git to convert binary
files to a text format before generating the diff.
- Set
-
A path to which the
diff
attribute is set is treated
as text, even when they contain byte values that
normally never appear in text files, such as NUL. - Unset
-
A path to which the
diff
attribute is unset will
generateBinary files differ
(or a binary patch, if
binary patches are enabled). - Unspecified
-
A path to which the
diff
attribute is unspecified
first gets its contents inspected, and if it looks like
text and is smaller than core.bigFileThreshold, it is treated
as text. Otherwise it would generateBinary files differ
. - String
-
Diff is shown using the specified diff driver. Each driver may
specify one or more options, as described in the following
section. The options for the diff driver «foo» are defined
by the configuration variables in the «diff.foo» section of the
Git config file.
Defining an external diff driver
The definition of a diff driver is done in gitconfig
, not
gitattributes
file, so strictly speaking this manual page is a
wrong place to talk about it. However…
To define an external diff driver jcdiff
, add a section to your
$GIT_DIR/config
file (or $HOME/.gitconfig
file) like this:
[diff "jcdiff"] command = j-c-diff
When Git needs to show you a diff for the path with diff
attribute set to jcdiff
, it calls the command you specified
with the above configuration, i.e. j-c-diff
, with 7
parameters, just like GIT_EXTERNAL_DIFF
program is called.
See git[1] for details.
Setting the internal diff algorithm
The diff algorithm can be set through the diff.algorithm
config key, but
sometimes it may be helpful to set the diff algorithm per path. For example,
one may want to use the minimal
diff algorithm for .json files, and the
histogram
for .c files, and so on without having to pass in the algorithm
through the command line each time.
First, in .gitattributes
, assign the diff
attribute for paths.
Then, define a «diff.<name>.algorithm» configuration to specify the diff
algorithm, choosing from myers
, patience
, minimal
, or histogram
.
[diff "<name>"] algorithm = histogram
This diff algorithm applies to user facing diff output like git-diff(1),
git-show(1) and is used for the --stat
output as well. The merge machinery
will not use the diff algorithm set through this method.
Note |
If diff.<name>.command is defined for path with thediff=<name> attribute, it is executed as an external diff driver(see above), and adding diff.<name>.algorithm has no effect, as thealgorithm is not passed to the external diff driver. |
Each group of changes (called a «hunk») in the textual diff output
is prefixed with a line of the form:
This is called a hunk header. The «TEXT» portion is by default a line
that begins with an alphabet, an underscore or a dollar sign; this
matches what GNU diff -p output uses. This default selection however
is not suited for some contents, and you can use a customized pattern
to make a selection.
First, in .gitattributes, you would assign the diff
attribute
for paths.
Then, you would define a «diff.tex.xfuncname» configuration to
specify a regular expression that matches a line that you would
want to appear as the hunk header «TEXT». Add a section to your
$GIT_DIR/config
file (or $HOME/.gitconfig
file) like this:
[diff "tex"] xfuncname = "^(\\\\(sub)*section\\{.*)$"
Note. A single level of backslashes are eaten by the
configuration file parser, so you would need to double the
backslashes; the pattern above picks a line that begins with a
backslash, and zero or more occurrences of sub
followed by
section
followed by open brace, to the end of line.
There are a few built-in patterns to make this easier, and tex
is one of them, so you do not have to write the above in your
configuration file (you still need to enable this with the
attribute mechanism, via .gitattributes
). The following built in
patterns are available:
-
ada
suitable for source code in the Ada language. -
bash
suitable for source code in the Bourne-Again SHell language.
Covers a superset of POSIX shell function definitions. -
bibtex
suitable for files with BibTeX coded references. -
cpp
suitable for source code in the C and C++ languages. -
csharp
suitable for source code in the C# language. -
css
suitable for cascading style sheets. -
dts
suitable for devicetree (DTS) files. -
elixir
suitable for source code in the Elixir language. -
fortran
suitable for source code in the Fortran language. -
fountain
suitable for Fountain documents. -
golang
suitable for source code in the Go language. -
html
suitable for HTML/XHTML documents. -
java
suitable for source code in the Java language. -
kotlin
suitable for source code in the Kotlin language. -
markdown
suitable for Markdown documents. -
matlab
suitable for source code in the MATLAB and Octave languages. -
objc
suitable for source code in the Objective-C language. -
pascal
suitable for source code in the Pascal/Delphi language. -
perl
suitable for source code in the Perl language. -
php
suitable for source code in the PHP language. -
python
suitable for source code in the Python language. -
ruby
suitable for source code in the Ruby language. -
rust
suitable for source code in the Rust language. -
scheme
suitable for source code in the Scheme language. -
tex
suitable for source code for LaTeX documents.
Customizing word diff
You can customize the rules that git diff --word-diff
uses to
split words in a line, by specifying an appropriate regular expression
in the «diff.*.wordRegex» configuration variable. For example, in TeX
a backslash followed by a sequence of letters forms a command, but
several such commands can be run together without intervening
whitespace. To separate them, use a regular expression in your
$GIT_DIR/config
file (or $HOME/.gitconfig
file) like this:
[diff "tex"] wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
A built-in pattern is provided for all languages listed in the
previous section.
Performing text diffs of binary files
Sometimes it is desirable to see the diff of a text-converted
version of some binary files. For example, a word processor
document can be converted to an ASCII text representation, and
the diff of the text shown. Even though this conversion loses
some information, the resulting diff is useful for human
viewing (but cannot be applied directly).
The textconv
config option is used to define a program for
performing such a conversion. The program should take a single
argument, the name of a file to convert, and produce the
resulting text on stdout.
For example, to show the diff of the exif information of a
file instead of the binary information (assuming you have the
exif tool installed), add the following section to your
$GIT_DIR/config
file (or $HOME/.gitconfig
file):
[diff "jpg"] textconv = exif
Note |
The text conversion is generally a one-way conversion; in this example, we lose the actual image contents and focus just on the text data. This means that diffs generated by textconv are not suitable for applying. For this reason, only git diff and the git log family of commands (i.e.,log, whatchanged, show) will perform text conversion. git will never generate this output. If you want tosend somebody a text-converted diff of a binary file (e.g., because it quickly conveys the changes you have made), you should generate it separately and send it as a comment in addition to the usual binary diff that you might send. |
Because text conversion can be slow, especially when doing a
large number of them with git log -p
, Git provides a mechanism
to cache the output and use it in future diffs. To enable
caching, set the «cachetextconv» variable in your diff driver’s
config. For example:
[diff "jpg"] textconv = exif cachetextconv = true
This will cache the result of running «exif» on each blob
indefinitely. If you change the textconv config variable for a
diff driver, Git will automatically invalidate the cache entries
and re-run the textconv filter. If you want to invalidate the
cache manually (e.g., because your version of «exif» was updated
and now produces better output), you can remove the cache
manually with git update-ref -d refs/notes/textconv/jpg
(where
«jpg» is the name of the diff driver, as in the example above).
Choosing textconv versus external diff
If you want to show differences between binary or specially-formatted
blobs in your repository, you can choose to use either an external diff
command, or to use textconv to convert them to a diff-able text format.
Which method you choose depends on your exact situation.
The advantage of using an external diff command is flexibility. You are
not bound to find line-oriented changes, nor is it necessary for the
output to resemble unified diff. You are free to locate and report
changes in the most appropriate way for your data format.
A textconv, by comparison, is much more limiting. You provide a
transformation of the data into a line-oriented text format, and Git
uses its regular diff tools to generate the output. There are several
advantages to choosing this method:
-
Ease of use. It is often much simpler to write a binary to text
transformation than it is to perform your own diff. In many cases,
existing programs can be used as textconv filters (e.g., exif,
odt2txt). -
Git diff features. By performing only the transformation step
yourself, you can still utilize many of Git’s diff features,
including colorization, word-diff, and combined diffs for merges. -
Caching. Textconv caching can speed up repeated diffs, such as those
you might trigger by runninggit log -p
.
Marking files as binary
Git usually guesses correctly whether a blob contains text or binary
data by examining the beginning of the contents. However, sometimes you
may want to override its decision, either because a blob contains binary
data later in the file, or because the content, while technically
composed of text characters, is opaque to a human reader. For example,
many postscript files contain only ASCII characters, but produce noisy
and meaningless diffs.
The simplest way to mark a file as binary is to unset the diff
attribute in the .gitattributes
file:
This will cause Git to generate Binary files differ
(or a binary
patch, if binary patches are enabled) instead of a regular diff.
However, one may also want to specify other diff driver attributes. For
example, you might want to use textconv
to convert postscript files to
an ASCII representation for human viewing, but otherwise treat them as
binary files. You cannot specify both -diff
and diff=ps
attributes.
The solution is to use the diff.*.binary
config option:
[diff "ps"] textconv = ps2ascii binary = true
Performing a three-way merge
merge
The attribute merge
affects how three versions of a file are
merged when a file-level merge is necessary during git merge
,
and other commands such as git revert
and git cherry-pick
.
- Set
-
Built-in 3-way merge driver is used to merge the
contents in a way similar to merge command ofRCS
suite. This is suitable for ordinary text files. - Unset
-
Take the version from the current branch as the
tentative merge result, and declare that the merge has
conflicts. This is suitable for binary files that do
not have a well-defined merge semantics. - Unspecified
-
By default, this uses the same built-in 3-way merge
driver as is the case when themerge
attribute is set.
However, themerge.default
configuration variable can name
different merge driver to be used with paths for which the
merge
attribute is unspecified. - String
-
3-way merge is performed using the specified custom
merge driver. The built-in 3-way merge driver can be
explicitly specified by asking for «text» driver; the
built-in «take the current branch» driver can be
requested with «binary».
Built-in merge drivers
There are a few built-in low-level merge drivers defined that
can be asked for via the merge
attribute.
- text
-
Usual 3-way file level merge for text files. Conflicted
regions are marked with conflict markers<<<<<<<
,
=======
and>>>>>>>
. The version from your branch
appears before the=======
marker, and the version
from the merged branch appears after the=======
marker. - binary
-
Keep the version from your branch in the work tree, but
leave the path in the conflicted state for the user to
sort out. - union
-
Run 3-way file level merge for text files, but take
lines from both versions, instead of leaving conflict
markers. This tends to leave the added lines in the
resulting file in random order and the user should
verify the result. Do not use this if you do not
understand the implications.
Defining a custom merge driver
The definition of a merge driver is done in the .git/config
file, not in the gitattributes
file, so strictly speaking this
manual page is a wrong place to talk about it. However…
To define a custom merge driver filfre
, add a section to your
$GIT_DIR/config
file (or $HOME/.gitconfig
file) like this:
[merge "filfre"] name = feel-free merge driver driver = filfre %O %A %B %L %P recursive = binary
The merge.*.name
variable gives the driver a human-readable
name.
The merge.*.driver
variable’s value is used to construct a
command to run to merge ancestor’s version (%O
), current
version (%A
) and the other branches’ version (%B
). These
three tokens are replaced with the names of temporary files that
hold the contents of these versions when the command line is
built. Additionally, %L will be replaced with the conflict marker
size (see below).
The merge driver is expected to leave the result of the merge in
the file named with %A
by overwriting it, and exit with zero
status if it managed to merge them cleanly, or non-zero if there
were conflicts. When the driver crashes (e.g. killed by SEGV),
it is expected to exit with non-zero status that are higher than
128, and in such a case, the merge results in a failure (which is
different from producing a conflict).
The merge.*.recursive
variable specifies what other merge
driver to use when the merge driver is called for an internal
merge between common ancestors, when there are more than one.
When left unspecified, the driver itself is used for both
internal merge and the final merge.
The merge driver can learn the pathname in which the merged result
will be stored via placeholder %P
.
conflict-marker-size
This attribute controls the length of conflict markers left in
the work tree file during a conflicted merge. Only setting to
the value to a positive integer has any meaningful effect.
For example, this line in .gitattributes
can be used to tell the merge
machinery to leave much longer (instead of the usual 7-character-long)
conflict markers when merging the file Documentation/git-merge.txt
results in a conflict.
Documentation/git-merge.txt conflict-marker-size=32
Checking whitespace errors
whitespace
The core.whitespace
configuration variable allows you to define what
diff and apply should consider whitespace errors for all paths in
the project (See git-config[1]). This attribute gives you finer
control per path.
- Set
-
Notice all types of potential whitespace errors known to Git.
The tab width is taken from the value of thecore.whitespace
configuration variable. - Unset
-
Do not notice anything as error.
- Unspecified
-
Use the value of the
core.whitespace
configuration variable to
decide what to notice as error. - String
-
Specify a comma separated list of common whitespace problems to
notice in the same format as thecore.whitespace
configuration
variable.
Creating an archive
export-ignore
Files and directories with the attribute export-ignore
won’t be added to
archive files.
export-subst
If the attribute export-subst
is set for a file then Git will expand
several placeholders when adding this file to an archive. The
expansion depends on the availability of a commit ID, i.e., if
git-archive[1] has been given a tree instead of a commit or a
tag then no replacement will be done. The placeholders are the same
as those for the option --pretty=format:
of git-log[1],
except that they need to be wrapped like this: $Format:PLACEHOLDERS$
in the file. E.g. the string $Format:%H$
will be replaced by the
commit hash. However, only one %(describe)
placeholder is expanded
per archive to avoid denial-of-service attacks.
Packing objects
delta
Delta compression will not be attempted for blobs for paths with the
attribute delta
set to false.
Viewing files in GUI tools
encoding
The value of this attribute specifies the character encoding that should
be used by GUI tools (e.g. gitk[1] and git-gui[1]) to
display the contents of the relevant file. Note that due to performance
considerations gitk[1] does not use this attribute unless you
manually enable per-file encodings in its options.
If this attribute is not set or has an invalid value, the value of the
gui.encoding
configuration variable is used instead
(See git-config[1]).
USING MACRO ATTRIBUTES
You do not want any end-of-line conversions applied to, nor textual diffs
produced for, any binary file you track. You would need to specify e.g.
but that may become cumbersome, when you have many attributes. Using
macro attributes, you can define an attribute that, when set, also
sets or unsets a number of other attributes at the same time. The
system knows a built-in macro attribute, binary
:
Setting the «binary» attribute also unsets the «text» and «diff»
attributes as above. Note that macro attributes can only be «Set»,
though setting one might have the effect of setting or unsetting other
attributes or even returning other attributes to the «Unspecified»
state.
DEFINING MACRO ATTRIBUTES
Custom macro attributes can be defined only in top-level gitattributes
files ($GIT_DIR/info/attributes
, the .gitattributes
file at the
top level of the working tree, or the global or system-wide
gitattributes files), not in .gitattributes
files in working tree
subdirectories. The built-in macro attribute «binary» is equivalent
to:
[attr]binary -diff -merge -text
NOTES
Git does not follow symbolic links when accessing a .gitattributes
file in the working tree. This keeps behavior consistent when the file
is accessed from the index or a tree versus from the filesystem.
EXAMPLES
If you have these three gitattributes
file:
(in $GIT_DIR/info/attributes) a* foo !bar -baz (in .gitattributes) abc foo bar baz (in t/.gitattributes) ab* merge=filfre abc -foo -bar *.c frotz
the attributes given to path t/abc
are computed as follows:
-
By examining
t/.gitattributes
(which is in the same
directory as the path in question), Git finds that the first
line matches.merge
attribute is set. It also finds that
the second line matches, and attributesfoo
andbar
are unset. -
Then it examines
.gitattributes
(which is in the parent
directory), and finds that the first line matches, but
t/.gitattributes
file already decided howmerge
,foo
andbar
attributes should be given to this path, so it
leavesfoo
andbar
unset. Attributebaz
is set. -
Finally it examines
$GIT_DIR/info/attributes
. This file
is used to override the in-tree settings. The first line is
a match, andfoo
is set,bar
is reverted to unspecified
state, andbaz
is unset.
As the result, the attributes assignment to t/abc
becomes:
foo set to true bar unspecified baz set to false merge set to string value "filfre" frotz unspecified
SEE ALSO
Git tries to help translate line endings between operating systems with different standards. This gets sooo frustrating. Here’s what I always want:
On Windows:
git config --global core.autocrlf input
This says, “If I commit a file with the wrong line endings, fix it before other people notice.” Otherwise, leave it alone.
On Linux, Mac, etc:
git config --global core.autocrlf false
This says, “Don’t screw with the line endings.”
Nowhere:
git config --global core.autocrlf true
This says, “Screw with the line endings. Make them all include carriage return on my filesystem, but not have carriage return when I push to the shared repository.” This is not necessary.
Windows and Linux on the same files:
This happens when you’re running Linux in a docker container and mounting files that are stored on Windows. Generally, stick with the Windows strategy of core.autocrlf=input
, unless you have .bat
or .cmd
(Windows executables) in your repository.
The VS Code docs have tips for this case. They suggest setting up the repository with a .gitattributes
file that says “mostly use LF as line endings, but .bat
and .cmd
files need CR+LF”:
* text=auto eol=lf *.{cmd,[cC][mM][dD]} text eol=crlf *.{bat,[bB][aA][tT]} text eol=crlf
If VSCode insists on putting CR (pictured as ^M in the git diff) in a file, then open the file and check the lower right-hand corner, in the status bar. Does it say “CRLF”? Click that and choose “LF” instead. Do things right, VSCode 😠
Troubleshooting
When git is surprising you:
Check for overrides
Within a repository, the .gitattributes
file can override the autocrlf
behavior for all files or sets of files. Watch out for the text
and eol
attributes. It is incredibly complicated.
Check your settings
To find out which one is in effect for new clones:git config --global --get core.autocrlf
Or in one repository of interest:git config --local --get core.autocrlf
Why is it set that way? Find out:git config --list --show-origin
This shows all the places the settings are set. Including duplicates — it’s OK for there to be multiple entries for one setting.
Why does this even exist?
Historical reasons, of course! (If you have a Ruby Tapas subscription, there’s a great little history lesson on this.)
Back in the day, many Windows programs expected files to have line endings marked with CR+LF characters (carriage return + line feed, or \r\n
). These days, these programs work fine with either CR+LF or with LF alone. Meanwhile, Linux/Mac programs expect LF alone.
Use LF alone! There’s no reason to include the CR characters, even if you’re working on Windows.
One danger: new files created in programs like Notepad get CR+LF. Those files look like they have \r
on every line when viewed in Linux/Mac programs or (in code) read into strings and split on \n
.
That’s why, on Windows, it makes sense to ask git to change line endings from CR+LF to LF on files that it saves. core.autocrlf=input
says, screw with the line endings only in one direction. Don’t add CR, but do take it away before other people see it.
Postscript
I love ternary booleans like this: true, false, input. Hilarious! This illustrates: don’t use booleans in your interfaces. Use enums instead. Names are useful. autocrlf=ScrewWithLineEndings|GoAway|HideMyCRs