Перенос строки в windows и linux

В разных операционных системах перевод строки обозначается по-разному:: GNU/Linux — \n;; Apple Macintosh (Mac) — \r;; Microsoft Windows — \r\n.

Это следует учитывать при составлении шаблонов регулярных выражений для соответствующих функции PHP и, чтобы парсинг производился правильно, можно использовать вместо них универсальную экранирующую последовательность «\R», которая соответствует любому из трёх, указанных выше, вариантов:

<?php

$string_n = "\n";
$string_r = "\r";
$string_rn = "\r\n";

var_dump( preg_match( '=\R=', $string_n ) );        // int(1)
var_dump( preg_match( '=\R=', $string_r ) );        // int(1)
var_dump( preg_match( '=\R{2}=', $string_rn ) );    // int(0)

?>

В коде выше, в последнем поиске соответствий, указан шаблон '=\R{2}=', чтобы показать, что управляющий символ «\R» соответствует последовательности «\r\n» как одному символу.

Перевод строки
Экранирующие последовательности

16.12.2014

Источник

This is how new line is represented in operating systems Windows (\r\n)and linux (\n)

On Unix the \r is a Carriage Return (CR) and the
\n a Line Feed (LF) which together happen to be the be Windows newline
identifier and you are replacing them with a Unix newline identifier.

On Windows the \r is a CR, too, but the \n is a combination of CR and LF. So
effectively you are trying to replace CR+CR+LF with CR+LF. Doesn’t make much
sense, does it.

From «perldoc perlop»:
All systems use the virtual «»\n»» to represent a line terminator,
called a «newline». There is no such thing as an unvarying, physical
newline character. It is only an illusion that the operating system,
device drivers, C libraries, and Perl all conspire to preserve. Not all
systems read «»\r»» as ASCII CR and «»\n»» as ASCII LF. For example, on
a Mac, these are reversed, and on systems without line terminator,
printing «»\n»» may emit no actual data. In general, use «»\n»» when you
mean a «newline» for your system, but use the literal ASCII when you
need an exact character. For example, most networking protocols expect
and prefer a CR+LF («»\015\012″» or «»\cM\cJ»») for line terminators,
and although they often accept just «»\012″», they seldom tolerate just
«»\015″». If you get in the habit of using «»\n»» for networking, you
may be burned some day.

Источник

Перевод строки в операционных системах Windows и Linux — одна из немаловажных вещей, с которыми программисты сталкиваются в своих проектах. В этой статье мы рассмотрим различия между переводом строки в этих системах, а также поделаем некоторые практические рекомендации.

Перевод строки — это способ обозначения конца строки в текстовых файлах. В разных операционных системах используются разные символы для обозначения перевода строки. В Windows принято использовать символы ‘CR’ и ‘LF’, обозначаемые как ‘\r\n’. В Linux же используется только символ ‘LF’, который обозначается как ‘\n’.

Изначально различия в переводе строки связаны с историческими причинами. В Unix-подобных системах (включая Linux) перевод строки обозначался только символом ‘LF’. Это было стандартом, принятым во многих платформозависимых системах. Однако комания Microsoft при разработке операционной системы DOS (которая стала основой для Windows) решила использовать комбинацию из символов ‘CR’ и ‘LF’ для перевода строки. Этот выбор был обусловлен исторической совместимостью с линейными принтерами, которые ожидали такое сочетание символов для перехода на новую строку.

Несмотря на исторические причины, с течением времени выделились определенные практики и преимущества одной или другой системы перевода строки.

В Windows, использование символов ‘CR’ и ‘LF’ имеет свои преимущества. Во-первых, такой подход обеспечивает полную совместимость с принтерами, которые ожидают передачу обоих символов для корректного вывода текста. Во-вторых, символ перевода строки в Windows предполагает изменение положения курсора в начало новой строки, что делает отображение текста более понятным для пользователя.

С другой стороны, в Unix-подобных системах (Linux включительно) использование только символа ‘LF’ упрощает работу с текстовыми файлами. Он занимает на один символ меньше и сохраняет меньший объем памяти. Более того, в Unix-подобных системах работа с текстовыми файлами в режиме ASCII может происходить быстрее и эффективнее, так как операционная система ожидает отображение перевода строки только символом ‘LF’.

Однако, разница в использовании перевода строки может быть незаметной в простых текстовых файлах. Компиляторы и интерпретаторы языков программирования обычно умеют корректно обрабатывать оба типа перевода строки и правильно интерпретируют конец строки во время компиляции или выполнения программы.

Тем не менее, проблемы могут возникнуть при работе с файлами в разных операционных системах. Если текстовый файл с переводом строки, принятом в Windows, будет открыт в Unix-подобной системе, то переводы строк будут отображаться некорректно — все текстовые строки будут выведены в одну строку. С другой стороны, если текстовый файл создан в Unix-подобной системе и открыт в Windows, то переводы строк будут отображаться дважды — первый раз символом ‘CR’, а второй раз символом ‘LF’.

Чтобы избежать таких проблем, существуют различные способы решения. Один из них — использовать универсальные символы для перевода строки. Например, в Python есть специальные символьные коды ‘\r\n’ и ‘\n’, которые будут интерпретироваться корректно в обеих операционных системах. Аналогично, в Java и C++ можно использовать системную функцию перевода строки, которая автоматически выберет правильный символ перевода строки в зависимости от операционной системы.

Другой вариант — использование утилит для конвертирования перевода строки. Например, утилита `dos2unix` в Unix-подобных системах позволяет преобразовывать текстовые файлы с символами ‘CR’ и ‘LF’ в файлы только с символами ‘LF’. В свою очередь утилита `unix2dos` выполняет обратное преобразование.

В заключение хотелось бы отметить, что перевод строки в разных операционных системах — это не просто технический аспект, но и вопрос стандартов и практической совместимости. Несмотря на существующие различия, современные инструменты программирования обладают гибкостью и способностью корректно обрабатывать оба типа перевода строки. Однако, при работе с текстовыми файлами важно помнить о правилах перевода строки для конкретных операционных систем и применять соответствующие методы обработки данных.

Источник

CR and LF

The American Standard Code for Information Interchange (ASCII) defined control-characters including CARRIAGE-RETURN (CR) and LINE-FEED (LF) that were (and still are) used to control the print-position on printers in a way analogous to the mechanical typewriters that preceded early computer printers.

Platform dependency

In Windows the traditional line-separator in text files is CR followed by LF

In old (pre OSX) Apple Macintosh systems the traditional line separator in text files was CR

In Unix and Linux, the traditional line-separator in text files is LF.

\n and \r

In many programming and scripting languages \n means «new line». Sometimes (but not always) this means the ASCII LINE-FEED character (LF), which, as you say, moves the cursor (or print position) down one line. In a printer or typewriter, this would actually move the paper up one line.

Invariably \r means the ASCII CARRIAGE-RETURN character (CR) whose name actually comes from mechanical typewriters where there was a carriage-return key that caused the roller («carriage») that carried the paper to move to the right, powered by a spring, as far as it would go. Thus setting the current typing position to the left margin.

Programming

In some programming languages \n can mean a platform-dependent sequence of characters that end or separate lines in a text file. For example in Perl, print "\n" produces a different sequence of characters on Linux than on Windows.

In Java, best practise, if you want to use the native line endings for the runtime platform, is not to use \n or \r at all. You should use System.getProperty("line.separator"). You should use \n and \r where you want LF and CR regardless of platform (e.g. as used in HTTP, FTP and other Internet communications protocols).

Unix stty

In a Unix shell, the stty command can be used to cause the shell to translate between these various conventions. For example stty -onlcr will cause the shell to subsequently translate all outgoing LFs to CR LF.

Linux and OSX follow Unix conventions

Text files

Text files are still enormously important and widely used. For example, HTML and XML are examples of text file. Most of the important Internet protocols, such as HTTP, follow text-file conventions and include specifications for line-endings.

Printers

Most printers other than the very cheapest, still respect CR and LF. In fact they are fundamental to the most widely used page description languages — PCL and Postscript.

Источник

As a supplement,

1, Carriage return: It’s a printer terminology meaning changing the print location to the beginning of current line. In computer world, it means return to the beginning of current line in most cases but stands for new line rarely.

2, Line feed: It’s a printer terminology meaning advancing the paper one line. So Carriage return and Line feed are used together to start to print at the beginning of a new line. In computer world, it generally has the same meaning as newline.

3, Form feed: It’s a printer terminology, I like the explanation in this thread.

If you were programming for a 1980s-style printer, it would eject the
paper and start a new page. You are virtually certain to never need
it.

http://en.wikipedia.org/wiki/Form_feed

It’s almost obsolete and you can refer to Escape sequence \f — form feed — what exactly is it? for detailed explanation.

Note, we can use CR or LF or CRLF to stand for newline in some platforms but newline can’t be stood by them in some other platforms. Refer to wiki Newline for details.

LF: Multics, Unix and Unix-like systems (Linux, OS X, FreeBSD, AIX,
Xenix, etc.), BeOS, Amiga, RISC OS, and others

CR: Commodore 8-bit machines, Acorn BBC, ZX Spectrum, TRS-80, Apple
II family, Oberon, the classic Mac OS up to version 9, MIT Lisp
Machine and OS-9

RS: QNX pre-POSIX implementation

0x9B: Atari 8-bit machines using ATASCII variant of ASCII (155 in
decimal)

CR+LF: Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10,
RT-11, CP/M, MP/M, Atari TOS, OS/2, Symbian OS, Palm OS, Amstrad CPC,
and most other early non-Unix and non-IBM OSes

LF+CR: Acorn BBC and RISC OS spooled text output.

Источник

CR and LF

Platform dependency

\n and \r

Programming

Unix stty

Text files

Printers

Другие наши интересноые статьи: