Кодировка windows 1251 и ansi

Reg.ru: домены и хостинг

Крупнейший регистратор и хостинг-провайдер в России.

Более 2 миллионов доменных имен на обслуживании.

Продвижение, почта для домена, решения для бизнеса.

Более 700 тыс. клиентов по всему миру уже сделали свой выбор.

Перейти на сайт->

Бесплатный Курс «Практика HTML5 и CSS3»

Освойте бесплатно пошаговый видеокурс

по основам адаптивной верстки

на HTML5 и CSS3 с полного нуля.

Начать->

Фреймворк Bootstrap: быстрая адаптивная вёрстка

Пошаговый видеокурс по основам адаптивной верстки в фреймворке Bootstrap.

Научитесь верстать просто, быстро и качественно, используя мощный и практичный инструмент.

Верстайте на заказ и получайте деньги.

Получить в подарок->

Бесплатный курс «Сайт на WordPress»

Хотите освоить CMS WordPress?

Получите уроки по дизайну и верстке сайта на WordPress.

Научитесь работать с темами и нарезать макет.

Бесплатный видеокурс по рисованию дизайна сайта, его верстке и установке на CMS WordPress!

Получить в подарок->

*Наведите курсор мыши для приостановки прокрутки.

Кодировки: полезная информация и краткая ретроспектива

Данную статью я решил написать как небольшой обзор, касающийся вопроса кодировок.

Мы разберемся, что такое вообще кодировка и немного коснемся истории того, как они появились в принципе.

Мы поговорим о некоторых их особенностях а также рассмотрим моменты, позволяющие нам работать с кодировками более осознанно и избегать появления на сайте так называемых кракозябров, т.е. нечитаемых символов.

Итак, поехали…

Что такое кодировка?

Упрощенно говоря, кодировка — это таблица сопоставлений символов, которые мы можем видеть на экране, определенным числовым кодам.

Т.е. каждый символ, который мы вводим с клавиатуры, либо видим на экране монитора, закодирован определенной последовательностью битов (нулей и единиц). 8 бит, как вы, наверное, знаете, равны 1 байту информации, но об этом чуть позже.

Внешний вид самих символов определяется файлами шрифтов, которые установлены на вашем компьютере. Поэтому процесс вывода на экран текста можно описать как постоянное сопоставление последовательностей нулей и единиц каким-то конкретным символам, входящим в состав шрифта.

Прародителем всех современных кодировок можно считать ASCII.

Эта аббревиатура расшифровывается как American Standard Code for Information Interchange (американская стандартная кодировочная таблица для печатных символов и некоторых специальных кодов).

Это однобайтовая кодировка, в которую изначально заложено всего 128 символов: буквы латинского алфавита, арабские цифры и т.д.

Позже она была расширена (изначально она не использовала все 8 бит), поэтому появилась возможность использовать уже не 128, а 256 (2 в 8 степени) различных символов, которые можно закодировать в одном байте информации.

Такое усовершенствование позволило добавлять в ASCII символы национальных языков, помимо уже существующей латиницы.

Вариантов расширенной кодировки ASCII существует очень много по причине того, что языков в мире тоже немало. Думаю, что многие из вас слышали о такой кодировке, как KOI8-R — это тоже расширенная кодировка ASCII, предназначенная для работы с символами русского языка.

Следующим шагом в развитии кодировок можно считать появление так называемых ANSI-кодировок.

По сути это были те же расширенные версии ASCII, однако из них были удалены различные псевдографические элементы и добавлены символы типографики, для которых ранее не хватало «свободных мест».

Примером такой ANSI-кодировки является всем известная Windows-1251. Помимо типографических символов, в эту кодировку также были включены буквы алфавитов языков, близких к русскому (украинский, белорусский, сербский, македонский и болгарский).

ANSI-кодировка — это собирательное название. В действительности, реальная кодировка при использовании ANSI будет определяться тем, что указано в реестре вашей операционной системы Windows. В случае с русским языком это будет Windows-1251, однако, для других языков это будет другая разновидность ANSI.

Как вы понимаете, куча кодировок и отсутствие единого стандарта до добра не довели, что и стало причиной частых встреч с так называемыми кракозябрами — нечитаемым бессмысленным набором символов.

Причина их появления проста — это попытка отобразить символы, закодированные с помощью одной кодировочной таблицы, используя другую кодировочную таблицу.

В контексте веб-разработки, мы можем столкнуться с кракозябрами, когда, к примеру, русский текст по ошибке сохраняется не в той кодировке, которая используется на сервере.

Разумеется, это не единственный случай, когда мы можем получить нечитаемый текст — вариантов тут масса, особенно, если учесть, что есть еще база данных, в которой информация также хранится в определенной кодировке, есть сопоставление соединения с базой данных и т.д.

Возникновение всех этих проблем послужило стимулом для создания чего-то нового. Это должна была быть кодировка, которая могла бы кодировать любой язык в мире (ведь с помощью однобайтовых кодировок при всем желании нельзя описать все символы, скажем, китайского языка, где их явно больше, чем 256), любые дополнительные спецсимволы и типографику.

Одним словом, нужно было создать универсальную кодировку, которая решила бы проблему кракозябров раз и навсегда.

Юникод — универсальная кодировка текста (UTF-32, UTF-16 и UTF-8)

Сам стандарт был предложен в 1991 году некоммерческой организацией «Консорциум Юникода» (Unicode Consortium, Unicode Inc.), и первым результатом его работы стало создание кодировки UTF-32.

Кстати, сама аббревиатура UTF расшифровывается как Unicode Transformation Format (Формат Преобразования Юникод).

В этой кодировке для кодирования одного символа предполагалось использовать аж 32 бита, т.е. 4 байта информации. Если сравнивать это число с однобайтовыми кодировками, то мы придем к простому выводу: для кодирования 1 символа в этой универсальной кодировке нужно в 4 раза больше битов, что «утяжеляет» файл в 4 раза.

Очевидно также, что количество символов, которое потенциально могло быть описано с помощью данной кодировки, превышает все разумные пределы и технически ограничено числом, равным 2 в 32 степени. Понятно, что это был явный перебор и расточительство с точки зрения веса файлов, поэтому данная кодировка не получила распространения.

На смену ей пришла новая разработка — UTF-16.

Как очевидно из названия, в этой кодировке один символ кодируют уже не 32 бита, а только 16 (т.е. 2 байта). Очевидно, это делает любой символ вдвое «легче», чем в UTF-32, однако и вдвое «тяжелее» любого символа, закодированного с помощью однобайтовой кодировки.

Количество символов, доступное для кодирования в UTF-16 равно, как минимум, 2 в 16 степени, т.е. 65536 символов. Вроде бы все неплохо, к тому же окончательная величина кодового пространства в UTF-16 была расширена до более, чем 1 миллиона символов.

Однако и данная кодировка до конца не удовлетворяла потребности разработчиков. Скажем, если вы пишете, используя исключительно латинские символы, то после перехода с расширенной версии кодировки ASCII к UTF-16 вес каждого файла увеличивался вдвое.

В результате, была предпринята еще одна попытка создания чего-то универсального, и этим чем-то стала всем нам известная кодировка UTF-8.

UTF-8 — это многобайтовая кодировка с переменной длинной символа. Глядя на название, можно по аналогии с UTF-32 и UTF-16 подумать, что здесь для кодирования одного символа используется 8 бит, однако это не так. Точнее, не совсем так.

Дело в том, что UTF-8 обеспечивает наилучшую совместимость со старыми системами, использовавшими 8-битные символы. Для кодирования одного символа в UTF-8 реально используется от 1 до 4 байт (гипотетически можно и до 6 байт).

В UTF-8 все латинские символы кодируются 8 битами, как и в кодировке ASCII. Иными словами, базовая часть кодировки ASCII (128 символов) перешла в UTF-8, что позволяет «тратить» на их представление всего 1 байт, сохраняя при этом универсальность кодировки, ради которой все и затевалось.

Итак, если первые 128 символов кодируются 1 байтом, то все остальные символы кодируются уже 2 байтами и более. В частности, каждый символ кириллицы кодируется именно 2 байтами.

Таким образом, мы получили универсальную кодировку, позволяющую охватить все возможные символы, которые требуется отобразить, не «утяжеляя» без необходимости файлы.

C BOM или без BOM?

Если вы работали с текстовыми редакторами (редакторами кода), например Notepad++, phpDesigner, rapid PHP и т.д., то, вероятно, обращали внимание на то, что при задании кодировки, в которой будет создана страница, можно выбрать, как правило, 3 варианта:

— ANSI
— UTF-8
— UTF-8 без BOM

Сразу скажу, что выбирать всегда стоит именно последний вариант — UTF-8 без BOM.

Итак, что же такое BOM и почему нам это не нужно?

BOM расшифровывается как Byte Order Mark. Это специальный Unicode-символ, используемый для индикации порядка байтов текстового файла. По спецификации его использование не является обязательным, однако если BOM используется, то он должен быть установлен в начале текстового файла.

Не будем вдаваться в детали работы BOM. Для нас главный вывод следующий: использование этого служебного символа вместе с UTF-8 мешает программам считывать кодировку нормальным образом, в результате чего возникают ошибки в работе скриптов.

Поэтому, при работе с UTF-8 используйте именно вариант «UTF-8 без BOM». Также лучше не используйте редакторы, в которых в принципе нельзя указать кодировку (скажем, Блокнот из стандартных программ в Windows).

Кодировка текущего файла, открытого в редакторе кода, как правило, указывается в нижней части окна.

Обратите внимание, что запись «ANSI as UTF-8» в редакторе Notepad++ означает то же самое, что и «UTF-8 без BOM». Это одно и то же.

В программе phpDesigner нельзя сразу точно сказать, используется BOM, или нет. Для этого нужно кликнуть правой кнопкой мыши по надписи «UTF-8», после чего во всплывающем окне можно увидеть, используется ли BOM (опция Save with BOM).

В редакторе rapid PHP кодировка UTF-8 без BOM обозначается как «UTF-8*».

Как вы понимаете, в разных редакторах все выглядит немного по-разному, однако главную идею вы поняли.

После того, как документ сохранен в UTF-8 без BOM, нужно также убедиться, что верная кодировка указана в специальном метатэге в секции head вашего html-документа:

<meta charset = "utf-8" />

Соблюдение этих простых правил уже позволит вам избежать многих пробелем с кодировками.

На этом все, надеюсь, что данный небольшой экскурс и пояснения помогли вам лучше понять, что такое кодировки, какие они бывают и как работают.

Если вам интересна эта тема с более прикладной точки зрения, то рекомендую вам изучить мой видеоурок Полный UTF-8: чеклист для начинающих.

Дмитрий Науменко.

P.S. Присмотритесь к премиум-урокам по различным аспектам сайтостроения, а также к бесплатному курсу по созданию своей CMS-системы на PHP с нуля. Все это поможет вам быстрее и проще освоить различные технологии веб-разработки.

Понравился материал и хотите отблагодарить?
Просто поделитесь с друзьями и коллегами!

Смотрите также:

	PHP: Получение информации об объекте или классе, методах, свойствах и наследовании
	CodeIgniter: жив или мертв?
	Функции обратного вызова, анонимные функции и механизм замыканий
	Применение функции к каждому элементу массива
	Слияние массивов. Преобразование массива в строку
	Деструктор и копирование объектов с помощью метода __clone()
	Эволюция веб-разработчика или Почему фреймворк — это хорошо?
	Магические методы в PHP или методы-перехватчики (сеттеры, геттеры и др.)
	PHP: Удаление элементов массива
	Ключевое слово final (завершенные классы и методы в PHP)
	50 классных сервисов, программ и сайтов для веб-разработчиков

Наверх

Источник

zloypk

Подскажите — кодировка ANSI и windows-1251 одно и то же?

Вопрос задан

более трёх лет назад
5243 просмотра

Пригласить эксперта

ANSI — это институт стандартов. По сути, не существует такой кодировки. Часто под ANSI понимают однобайтную кодировку, выбранную в данный момент в системе пользователя. Но надеяться на то, что на машине пользователя будут точно те же региональные настройки, не стоит.

Показать ещё
Загружается…

09 окт. 2023, в 19:26

1500 руб./в час

09 окт. 2023, в 18:18

1000 руб./за проект

09 окт. 2023, в 18:11

15000 руб./за проект

Минуточку внимания

Источник

Прежде чем отвечать на вопрос о том, что же такое кодировка ANSI Windows, ответим сначала на другой вопрос: «Что же такое кодировка вообще?»

У каждого компьютера, в каждой системе используется определенный набор символов, зависящий от языка, используемого пользователем, от его профессиональных компетенций и личных предпочтений.

Общее определение кодировки

Так, в русском языке используется 33 символа для обозначения букв, в английском – 26. Также используется 10 цифр для счета (0; 1; 2; 3; 4; 5; 6; 7; 8; 9) и некоторые специальные символы, в том числе запятая, минус, пробел, точка, процент и так далее.

Каждому из этих символов при помощи кодовой таблицы присваивается порядковый номер. К примеру, букве «A» может быть присвоен номер 1; «Z» — 26 и так далее.

Собственно, номер, представляющий символ как целое число, считается кодом символа, а кодировка — это, соответственно, набор символов в такой таблице.

Богатство разнообразия кодовых таблиц

На данный момент существует довольно большое количество кодировок и кодовых таблиц, используемых разными специалистами: это и ASCII, разработанная в 1963 году в Америке, и Windows-1251, совсем недавно еще бывшая популярной благодаря Microsoft, KOI8-R и Guobiao — и многие, многие другие, причем процесс их появления и отмирания происходит и по сей день.

Среди этого огромного списка совершенно особо держится так называемая кодировка ANSI.

Дело в том, что в свое время компания Microsoft создала целый набор кодовых страниц:

_{Windows — 874}	_{Тайский}
_Windows-1250	_{Центральноевропейский}
_Windows-1251	_{Кириллический (все символы русского языка + символы близких языков)}
_Windows-1252	_{Западноевропейский}
_Windows-1253	_{Греческий}
_Windows-1254	_{Турецкий}
_Windows-1255	_{Еврейский}
_Windows-1256	_{Арабский}
_Windows-1257	_{Балтийский}
_Windows-1258	_{Вьетнамский}

Все они получили общее название таблицы кодировки ANSI, или кодовой страницы ANSI.

Интересный факт: одной из первых кодовых таблиц стала ASCII, в 1963 году созданная American National Standards Institute (Американским национальным институтом стандартов), сокращенно называвшимся именно ANSI.

Помимо всего прочего, эта кодировка содержит и непечатные символы, так называемые «Управляющие последовательности», или ESC, уникальные для всех таблиц символов, зачастую несовместимые между собой. При умелом использовании, однако, они позволяли скрывать и восстанавливать курсор, переводить его с одного положения в тексте на другое, устанавливать табуляцию, стирать часть окна терминала, в котором велась работа, изменять форматирование текста на экране и менять цвет (или даже рисовать и подавать звуковые сигналы!). В 1976 году, кстати, это было довольно неплохим подспорьем для программистов. Кстати, терминал — это устройство, требующееся для ввода и вывода информации. В те далекие времена он представлял собой монитор и клавиатуру, подсоединенные к ЭВМ (электронной вычислительной машине).

Некорректное отображение символов

К сожалению, в дальнейшем подобная система вызвала многочисленные сбои в системах, выводя вместо желаемых стихов, лент новостей или описаний любимых компьютерных игр так называемые кракозябры — бессмысленные, нечитаемые наборы символов. Появление этих вездесущих ошибок было вызвано всего лишь попыткой отображать символы, закодированные в одной кодовой таблице, при помощи другой.

Чаще всего с последствиями неверного чтения этой кодировки мы сталкиваемся в Интернете до сих пор, когда наш браузер по какой-то причине не может достаточно точно определить, какая именно из Windows-**** кодировок используется в данный момент, из-за указания веб-мастером общей кодировки ANSI либо изначально неверной кодировки, к примеру, 1252 вместо 1521. Ниже представлена точная таблица кодировок.

Кириллическая таблица ANSI-кодировок, Windows-1251

^{№ П/п.}

^HEX

^СИМВОЛ

^{№ П/п.}

^HEX

^СИМВОЛ

^{№ П/п.}

^HEX

^СИМВОЛ

⁰⁰⁰

⁰⁰

^NOP

⁰⁸⁶

⁵⁶

^V

¹⁷¹

^AB

^«

⁰⁰¹

⁰¹

^SOH

⁰⁸⁷

⁵⁷

^W

¹⁷²

^AC

^¬

⁰⁰²

⁰²

^STX

⁰⁸⁸

⁵⁸

^X

¹⁷³

^AD

⁰⁰³

⁰³

^ETX

⁰⁸⁹

⁵⁹

^Y

¹⁷⁴

^AE

^®

⁰⁰⁴

⁰⁴

^EOT

⁰⁹⁰

^5A

^Z

¹⁷⁵

^AF

^Ї

⁰⁰⁵

⁰⁵

^ENQ

⁰⁹¹

^5B

^[

¹⁷⁶

^B0

^°

⁰⁰⁶

⁰⁶

^ACK

⁰⁹²

^5C

^\

¹⁷⁷

^B1

^±

⁰⁰⁷

⁰⁷

^BEL

⁰⁹³

^5D

^]

¹⁷⁸

^B2

^І

⁰⁰⁸

⁰⁸

^BS

⁰⁹⁴

^5E

^{^}

¹⁷⁹

^B3

^і

⁰⁰⁹

⁰⁹

^TAB

⁰⁹⁵

^5F

^_

¹⁸⁰

^B4

^ґ

⁰¹⁰

^0A

^LF

⁰⁹⁶

⁶⁰

^`

¹⁸¹

^B5

^µ

⁰¹¹

^0B

^VT

⁰⁹⁷

⁶¹

^a

¹⁸²

^B6

^¶

⁰¹²

^0C

^FF

⁰⁹⁸

⁶²

^b

¹⁸³

^B7

^·

⁰¹³

^0D

^CR

⁰⁹⁹

⁶³

^c

¹⁸⁴

^B8

^Е

⁰¹⁴

^0E

^SO

¹⁰⁰

⁶⁴

^d

¹⁸⁵

^B9

^№

⁰¹⁵

^0F

^SI

¹⁰¹

⁶⁵

^e

¹⁸⁶

^BA

^Є

⁰¹⁶

¹⁰

^DLE

¹⁰²

⁶⁶

^f

¹⁸⁷

^BB

^»

⁰¹⁷

¹¹

^DC1

¹⁰³

⁶⁷

^g

¹⁸⁸

^BC

^ј

⁰¹⁸

¹²

^DC2

¹⁰⁴

⁶⁸

^h

¹⁸⁹

^BD

^Ѕ

⁰¹⁹

¹³

^DC3

¹⁰⁵

⁶⁹

ⁱ

¹⁹⁰

^BE

^Ѕ

⁰²⁰

¹⁴

^DC4

¹⁰⁶

^6A

^j

¹⁹¹

^BF

^Ї

⁰²¹

¹⁵

^NAK

¹⁰⁷

^6B

^k

¹⁹²

^C0

^А

⁰²²

¹⁶

^SYN

¹⁰⁸

^6C

^l

¹⁹³

^C1

^Б

⁰²³

¹⁷

^ETB

¹⁰⁹

^6D

^m

¹⁹⁴

^C2

^В

⁰²⁴

¹⁸

^CAN

¹¹⁰

^6E

ⁿ

¹⁹⁵

^C3

^Г

⁰²⁵

¹⁹

^EM

¹¹¹

^6F

^o

¹⁹⁶

^C4

^Д

⁰²⁶

^1A

^SUB

¹¹²

⁷⁰

^p

¹⁹⁷

^C5

^Е

⁰²⁷

^1B

^ESC

¹¹³

⁷¹

^q

¹⁹⁸

^C6

^Ж

⁰²⁸

^1C

^FS

¹¹⁴

⁷²

^r

¹⁹⁹

^C7

^З

⁰²⁹

^1D

^GS

¹¹⁵

⁷³

^s

²⁰⁰

^C8

^И

⁰³⁰

^1E

^RS

¹¹⁶

⁷⁴

^t

²⁰¹

^C9

^Й

⁰³¹

^1F

^US

¹¹⁷

⁷⁵

^u

²⁰²

^CA

^К

⁰³²

²⁰

^Пробел

¹¹⁸

⁷⁶

^v

²⁰³

^CB

^Л

⁰³³

²¹

^!

¹¹⁹

⁷⁷

^w

²⁰⁴

^CC

^М

⁰³⁴

²²

^«

¹²⁰

⁷⁸

^x

²⁰⁵

^CD

^Н

⁰³⁵

²³

^#

¹²¹

⁷⁹

^y

²⁰⁶

^CE

^О

⁰³⁶

²⁴

^$

¹²²

^7A

^z

²⁰⁷

^CF

^П

⁰³⁷

²⁵

^%

¹²³

^7B

^{

²⁰⁸

^D0

^Р

⁰³⁸

²⁶

^&

¹²⁴

^7C

^|

²⁰⁹

^D1

^С

⁰³⁹

²⁷

^‘

¹²⁵

^7D

^}

²¹⁰

^D2

^Т

⁰⁴⁰

²⁸

⁽

¹²⁶

^7E

^~

²¹¹

^D3

^У

⁰⁴¹

²⁹

⁾

¹²⁷

^7F

²¹²

^D4

^Ф

⁰⁴²

^2A

^*

¹²⁸

⁸⁰

^Ђ

²¹³

^D5

^Х

⁰⁴³

^2B

⁺

¹²⁹

⁸¹

^Ѓ

²¹⁴

^D6

^Ц

⁰⁴⁴

^2C

^,

¹³⁰

⁸²

^‚

²¹⁵

^D7

^Ч

⁰⁴⁵

^2D

^—

¹³¹

⁸³

^ѓ

²¹⁶

^D8

^Ш

⁰⁴⁶

^2E

^.

¹³²

⁸⁴

^„

²¹⁷

^D9

^Щ

⁰⁴⁷

^2F

^/

¹³³

⁸⁵

^…

²¹⁸

^DA

^Ъ

⁰⁴⁸

³⁰

⁰

¹³⁴

⁸⁶

^†

²¹⁹

^DB

^Ы

⁰⁴⁹

³¹

¹

¹³⁵

⁸⁷

^‡

²²⁰

^DC

^Ь

⁰⁵⁰

³²

²

¹³⁶

⁸⁸

^€

²²¹

^DD

^Э

⁰⁵¹

³³

³

¹³⁷

⁸⁹

^‰

²²²

^DE

^Ю

⁰⁵²

³⁴

⁴

¹³⁸

^8A

^Љ

²²³

^DF

^Я

⁰⁵³

³⁵

⁵

¹³⁹

^8B

^‹

²²⁴

^E0

^а

⁰⁵⁴

³⁶

⁶

¹⁴⁰

^8C

^Њ

²²⁵

^E1

^б

⁰⁵⁵

³⁷

⁷

¹⁴¹

^8D

^Ќ

²²⁶

^E2

^в

⁰⁵⁶

³⁸

⁸

¹⁴²

^8E

^Ћ

²²⁷

^E3

^г

⁰⁵⁷

³⁹

⁹

¹⁴³

^8F

^Џ

²²⁸

^E4

^д

⁰⁵⁸

^3A

^:

¹⁴⁴

⁹⁰

^Ђ

²²⁹

^E5

^е

⁰⁵⁹

^3B

^;

¹⁴⁵

⁹¹

^‘

²³⁰

^E6

^ж

⁰⁶⁰

^3C

^<

¹⁴⁶

⁹²

^’

²³¹

^E7

^з

⁰⁶¹

^3D

⁼

¹⁴⁷

⁹³

^“

²³²

^E8

^и

⁰⁶²

^3E

^>

¹⁴⁸

⁹⁴

^”

²³³

^E9

^й

⁰⁶³

^3F

^?

¹⁴⁹

⁹⁵

^•

²³⁴

^EA

^к

⁰⁶⁴

⁴⁰

^@

¹⁵⁰

⁹⁶

^–

²³⁵

^EB

^л

⁰⁶⁵

⁴¹

^A

¹⁵¹

⁹⁷

^—

²³⁶

^EC

^м

⁰⁶⁶

⁴²

^B

¹⁵²

⁹⁸

²³⁷

^ED

^н

⁰⁶⁷

⁴³

^C

¹⁵³

⁹⁹

^™

²³⁸

^EE

^о

⁰⁶⁸

⁴⁴

^D

¹⁵⁴

^9A

^љ

²³⁹

^EF

^п

⁰⁶⁹

⁴⁵

^E

¹⁵⁵

^9B

^›

²⁴⁰

^F0

^р

⁰⁷⁰

⁴⁶

^F

¹⁵⁶

^9C

^њ

²⁴¹

^F1

^с

⁰⁷¹

⁴⁷

^G

¹⁵⁷

^9D

^ќ

²⁴²

^F2

^т

⁰⁷²

⁴⁸

^H

¹⁵⁸

^9E

^ћ

²⁴³

^F3

^у

⁰⁷³

⁴⁹

^I

¹⁵⁹

^9F

^џ

²⁴⁴

^F4

^ф

⁰⁷⁴

^4A

^J

¹⁶⁰

^A0

²⁴⁵

^F5

^х

⁰⁷⁵

^4B

^K

¹⁶¹

^A1

^Ў

²⁴⁶

^F6

^ц

⁰⁷⁶

^4C

^L

¹⁶²

^A2

^ў

²⁴⁷

^F7

^ч

⁰⁷⁷

^4D

^M

¹⁶³

^A3

^Ј

²⁴⁸

^F8

^ш

⁰⁷⁸

^4E

^N

¹⁶⁴

^A4

^¤

²⁴⁹

^F9

^щ

⁰⁷⁹

^4F

^O

¹⁶⁵

^A5

^Ґ

²⁵⁰

^FA

^ъ

⁰⁸⁰

⁵⁰

^P

¹⁶⁶

^A6

^¦

²⁵¹

^FB

^ы

⁰⁸¹

⁵¹

^Q

¹⁶⁷

^A7

^§

²⁵²

^FC

^ь

⁰⁸²

⁵²

^R

¹⁶⁸

^A8

^Е

²⁵³

^FD

^э

⁰⁸³

⁵³

^S

¹⁶⁹

^A9

^©

²⁵⁴

^FE

^ю

⁰⁸⁴

⁵⁴

^T

¹⁷⁰

^AA

^Є

²⁵⁵

^FF

^я

⁰⁸⁵

⁵⁵

^U

Более того, в 1986 году ANSI была существенно расширена, благодаря Яну Э. Дэвису, написавшему пакет The Draw, позволяющий не просто использовать базовые, с нашей точки зрения, функции, но и полноценно (или почти полноценно) рисовать!

Подводя итоги

Таким образом, можно видеть, что кодировка ANSI, по сути, хоть и была довольно спорным решением, сохраняет свои позиции.

Со временем с легкой руки энтузиастов древний терминал ANSI перекочевал даже на телефоны!

Источник

From Wikipedia, the free encyclopedia

Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows,^{[citation needed]} although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

There are two groups of system code pages in Windows systems: OEM and Windows-native («ANSI») code pages.
(ANSI is the American National Standards Institute.) Code pages in both of these groups are extended ASCII code pages. Additional code pages are supported by standard Windows conversion routines, but not used as either type of system code page.

ANSI code page[edit]

Windows-125x series

Alias(es)	ANSI (misnomer)
Standard	WHATWG Encoding Standard
Extends	US-ASCII
Preceded by	ISO 8859
Succeeded by	Unicode UTF-16 (in Win32 API)
v t e

ANSI code pages (officially called «Windows code pages» ^[1] after Microsoft accepted the former term being a misnomer ^[2]) are used for native non-Unicode (say, byte oriented) applications using a graphical user interface on Windows systems. The term «ANSI» is a misnomer because these Windows code pages do not comply with any ANSI (American National Standards Institute) standard; code page 1252 was based on an early ANSI draft that became the international standard ISO 8859-1, ^[2] which adds a further 32 control codes and space for 96 printable characters. Among other differences, Windows code-pages allocate printable characters to the supplementary control code space, making them at best illegible to standards-compliant operating systems.)

Most legacy «ANSI» code pages have code page numbers in the pattern 125x. However, 874 (Thai) and the East Asian multi-byte «ANSI» code pages (932, 936, 949, 950), all of which are also used as OEM code pages, are numbered to match IBM encodings, none of which are identical to the Windows encodings (although most are similar). While code page 1258 is also used as an OEM code page, it is original to Microsoft rather than an extension to an existing encoding. IBM have assigned their own, different numbers for Microsoft’s variants, these are given for reference in the lists below where applicable.

All of the 125x Windows code pages, as well as 874 and 936, are labelled by Internet Assigned Numbers Authority (IANA) as «Windows-number«, although «Windows-936» is treated as a synonym for «GBK». Windows code page 932 is instead labelled as «Windows-31J».^[3]

ANSI Windows code pages, and especially the code page 1252, were so called since they were purportedly based on drafts submitted or intended for ANSI. However, ANSI and ISO have not standardized any of these code pages. Instead they are either:^[2]

Supersets of the standard sets such as those of ISO 8859 and the various national standards (like Windows-1252 vs. ISO-8859-1),
Major modifications of these (making them incompatible to various degrees, like Windows-1250 vs. ISO-8859-2)
Having no parallel encoding (like Windows-1257 vs. ISO-8859-4; ISO-8859-13 was introduced much later). Also, Windows-1251 follows neither the ISO-standardised ISO-8859-5 nor the then-prevailing KOI-8.

Microsoft assigned about twelve of the typography and business characters (including notably, the euro sign, €) in CP1252 to the code points 0x80–0x9F that, in ISO 8859, are assigned to C1 control codes. These assignments are also present in many other ANSI/Windows code pages at the same code-points. Windows did not use the C1 control codes, so this decision had no direct effect on Windows users. However, if included in a file transferred to a standards-compliant platform like Unix or MacOS, the information was invisible and potentially disruptive.^{[citation needed]}

OEM code page[edit]

The OEM code pages (original equipment manufacturer) are used by Win32 console applications, and by virtual DOS, and can be considered a holdover from DOS and the original IBM PC architecture. A separate suite of code pages was implemented not only due to compatibility, but also because the fonts of VGA (and descendant) hardware suggest encoding of line-drawing characters to be compatible with code page 437. Most OEM code pages share many code points, particularly for non-letter characters, with the second (non-ASCII) half of CP437.

A typical OEM code page, in its second half, does not resemble any ANSI/Windows code page even roughly. Nevertheless, two single-byte, fixed-width code pages (874 for Thai and 1258 for Vietnamese) and four multibyte CJK code pages (932, 936, 949, 950) are used as both OEM and ANSI code pages. Code page 1258 uses combining diacritics, as Vietnamese requires more than 128 letter-diacritic combinations. This is in contrast to VISCII, which replaces some of the C0 (i.e. ASCII) control codes.

History[edit]

Initially, computer systems and system programming languages did not make a distinction between characters and bytes: for the segmental scripts used in most of Africa, the Americas, southern and south-east Asia, the Middle East and Europe, a character needs just one byte, but two or more bytes are needed for the ideographic sets used in the rest of the world. This subsequently led to much confusion. Microsoft software and systems prior to the Windows NT line are examples of this, because they use the OEM and ANSI code pages that do not make the distinction.

Since the late 1990s, software and systems have adopted Unicode as their preferred storage format; this trend has been improved by the widespread adoption of XML which default to UTF-8 but also provides a mechanism for labelling the encoding used.^[4] All current Microsoft products and application program interfaces use Unicode internally,^{[citation needed]} but some applications continue to use the default encoding of the computer’s ‘locale’ when reading and writing text data to files or standard output.^{[citation needed]} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible mojibake in another.

UTF-8, UTF-16[edit]

Microsoft adopted a Unicode encoding (first the now-obsolete UCS-2, which was then Unicode’s only encoding), i.e. UTF-16 for all its operating systems from Windows NT onwards, but additionally supports UTF-8 (aka CP_UTF8) since Windows 10 version 1803.^[5]
UTF-16 uniquely encodes all Unicode characters in the Basic Multilingual Plane (BMP) using 16 bits but the remaining Unicode (e.g. emojis) is encoded with a 32-bit (four byte) code – while the rest of the industry (Unix-like systems and the web), and now Microsoft chose UTF-8 (which uses one byte for the 7-bit ASCII character set, two or three bytes for other characters in the BMP, and four bytes for the remainder).

List[edit]

The following Windows code pages exist:

Windows-125x series[edit]

These nine code pages are all extended ASCII 8-bit SBCS encodings, and were designed by Microsoft for use as ANSI codepages on Windows. They are commonly known by their IANA-registered^[6] names as windows-<number>, but are also sometimes called cp<number>, «cp» for «code page». They are all used as ANSI code pages; Windows-1258 is also used as an OEM code page.

The Windows-125x series includes nine of the ANSI code pages, and mostly covers scripts from Europe and West Asia with the addition of Vietnam. System encodings for Thai and for East Asian languages were numbered to match similar IBM code pages and are used as both ANSI and OEM code pages; these are covered in following sections.

ID	Description	Relationship to ISO 8859 or other established encodings
1250^[7]^[8]	Latin 2 / Central European	Similar to ISO-8859-2 but moves several characters, including multiple letters.
1251^[9]^[10]	Cyrillic	Incompatible with both ISO-8859-5 and KOI-8.
1252^[11]^[12]	Latin 1 / Western European	Superset of ISO-8859-1 (without C1 controls). Letter repertoire accordingly similar to CP850.
1253^[13]^[14]	Greek	Similar to ISO 8859-7 but moves several characters, including a letter.
1254^[15]^[16]	Turkish	Superset of ISO 8859-9 (without C1 controls).
1255^[17]^[18]	Hebrew	Almost a superset of ISO 8859-8, but with two incompatible punctuation changes.
1256^[19]^[20]	Arabic	Not compatible with ISO 8859-6; rather, OEM Code page 708 is an ISO 8859-6 (ASMO 708) superset.
1257^[21]^[22]	Baltic	Not ISO 8859-4; the later ISO 8859-13 is closely related, but with some differences in available punctuation.
1258^[23]^[24]	Vietnamese (also OEM)	Not related to VSCII or VISCII, uses fewer base characters with combining diacritics.

DOS code pages[edit]

These are also ASCII-based. Most of these are included for use as OEM code pages; code page 874 is also used as an ANSI code page.

437 – IBM PC US, 8-bit SBCS extended ASCII.^[25] Known as OEM-US, the encoding of the primary built-in font of VGA graphics cards.
708 – Arabic, extended ISO 8859-6 (ASMO 708)
720 – Arabic, retaining box drawing characters in their usual locations
737 – «MS-DOS Greek». Retains all box drawing characters. More popular than 869.
775 – «MS-DOS Baltic Rim»
850 – «MS-DOS Latin 1». Full (re-arranged) repertoire of ISO 8859-1.
852 – «MS-DOS Latin 2»
855 – «MS-DOS Cyrillic». Mainly used for South Slavic languages. Includes (re-arranged) repertoire of ISO-8859-5. Not to be confused with cp866.
857 – «MS-DOS Turkish»
858 – Western European with euro sign
860 – «MS-DOS Portuguese»
861 – «MS-DOS Icelandic»
862 – «MS-DOS Hebrew»
863 – «MS-DOS French Canada»
864 – Arabic
865 – «MS-DOS Nordic»
866 – «MS-DOS Cyrillic Russian», cp866. Sole purely OEM code page (rather than ANSI or both) included as a legacy encoding in WHATWG Encoding Standard for HTML5.
869 – «MS-DOS Greek 2», IBM869. Full (re-arranged) repertoire of ISO 8859-7.
874 – Thai, also used as the ANSI code page, extends ISO 8859-11 (and therefore TIS-620) with a few additional characters from Windows-1252. Corresponds to IBM code page 1162 (IBM-874 is similar but has different extensions).

East Asian multi-byte code pages[edit]

These often differ from the IBM code pages of the same number: code pages 932, 949 and 950 only partly match the IBM code pages of the same number, while the number 936 was used by IBM for another Simplified Chinese encoding which is now deprecated and Windows-951, as part of a kludge, is unrelated to IBM-951. IBM equivalent code pages are given in the second column. Code pages 932, 936, 949 and 950/951 are used as both ANSI and OEM code pages on the locales in question.

ID	Language	Encoding	IBM Equivalent	Difference from IBM CCSID of same number	Use
932	Japanese	Shift JIS (Microsoft variant)	943^[26]	IBM-932 is also Shift JIS, has fewer extensions (but those extensions it has are in common), and swaps some variant Chinese characters (itaiji) for interoperability with earlier editions of JIS C 6226.	ANSI/OEM (Japan)
936	Chinese (simplified)	GBK	1386	IBM-936 is a different Simplified Chinese encoding with a different encoding method, which has been deprecated since 1993.	ANSI/OEM (PRC, Singapore)
949	Korean	Unified Hangul Code	1363	IBM-949 is also an EUC-KR superset, but with different (colliding) extensions.	ANSI/OEM (Republic of Korea)
950	Chinese (traditional)	Big5 (Microsoft variant)	1373^[27]	IBM-950 is also Big5, but includes a different subset of the ETEN extensions, adds further extensions with an expanded trail byte range, and lacks the Euro.	ANSI/OEM (Taiwan, Hong Kong)
951	Chinese (traditional) including Cantonese	Big5-HKSCS (2001 ed.)	5471^[28]	IBM-951 is the double-byte plane from IBM-949 (see above), and unrelated to Microsoft’s internal use of the number 951.	ANSI/OEM (Hong Kong, 98/NT4/2000/XP with HKSCS patch)

A few further multiple-byte code pages are supported for decoding or encoding using operating system libraries, but not used as either sort of system encoding in any locale.

ID	IBM Equivalent	Language	Encoding	Use
1361	—	Korean	Johab (KS C 5601-1992 annex 3)	Conversion
20000	—	Chinese (traditional)	An encoding of CNS 11643	Conversion
20001	—	Chinese (traditional)	TCA	Conversion
20002	—	Chinese (traditional)	Big5 (ETEN variant)	Conversion
20003	938	Chinese (traditional)	IBM 5550	Conversion
20004	—	Chinese (traditional)	Teletext	Conversion
20005	—	Chinese (traditional)	Wang	Conversion
20932	954 (roughly)	Japanese	EUC-JP	Conversion
20936	5479	Chinese (simplified)	GB 2312	Conversion
20949, 51949	970	Korean	Wansung (8-bit with ASCII, i.e. EUC-KR)^[29]	Conversion

EBCDIC code pages[edit]

37 – IBM EBCDIC US-Canada, 8-bit SBCS^[30]
500 – Latin 1
870 – IBM870
875 – cp875
1026 – EBCDIC Turkish
1047 – IBM01047 – Latin 1
1140 – IBM01141
1141 – IBM01141
1142 – IBM01142
1143 – IBM01143
1144 – IBM01144
1145 – IBM01145
1146 – IBM01146
1147 – IBM01147
1148 – IBM01148
1149 – IBM01149
20273 – EBCDIC Germany
20277 – EBCDIC Denmark/Norway
20278 – EBCDIC Finland/Sweden
20280 – EBCDIC Italy
20284 – EBCDIC Latin America/Spain
20285 – EBCDIC United Kingdom
20290 – EBCDIC Japanese
20297 – EBCDIC France
20420 – EBCDIC Arabic
20423 – EBCDIC Greek
20424 – x-EBCDIC-KoreanExtended
20833 – Korean
20838 – EBCDIC Thai
20924 – IBM00924 – IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
20871 – EBCDIC Icelandic
20880 – EBCDIC Cyrillic
20905 – EBCDIC Turkish
21025 – EBCDIC Cyrillic
21027 – Japanese EBCDIC (incomplete,^[31] deprecated)^[32]

[edit]

1200 – Unicode (BMP of ISO 10646, UTF-16LE). Available only to managed applications.^[32]
1201 – Unicode (UTF-16BE). Available only to managed applications.^[32]
12000 – UTF-32. Available only to managed applications.^[32]
12001 – UTF-32. Big-endian. Available only to managed applications.^[32]
65000 – Unicode (UTF-7)
65001 – Unicode (UTF-8)

Macintosh compatibility code pages[edit]

10000 – Apple Macintosh Roman
10001 – Apple Macintosh Japanese
10002 – Apple Macintosh Chinese (traditional) (BIG-5)
10003 – Apple Macintosh Korean
10004 – Apple Macintosh Arabic
10005 – Apple Macintosh Hebrew
10006 – Apple Macintosh Greek
10007 – Apple Macintosh Cyrillic
10008 – Apple Macintosh Chinese (simplified) (GB 2312)
10010 – Apple Macintosh Romanian
10017 – Apple Macintosh Ukrainian
10021 – Apple Macintosh Thai
10029 – Apple Macintosh Roman II / Central Europe
10079 – Apple Macintosh Icelandic
10081 – Apple Macintosh Turkish
10082 – Apple Macintosh Croatian

ISO 8859 code pages[edit]

28591 – ISO-8859-1 – Latin-1 (IBM equivalent: 819)
28592 – ISO-8859-2 – Latin-2
28593 – ISO-8859-3 – Latin-3 or South European
28594 – ISO-8859-4 – Latin-4 or North European
28595 – ISO-8859-5 – Latin/Cyrillic
28596 – ISO-8859-6 – Latin/Arabic
28597 – ISO-8859-7 – Latin/Greek
28598 – ISO-8859-8 – Latin/Hebrew
28599 – ISO-8859-9 – Latin-5 or Turkish
28600 – ISO-8859-10 – Latin-6
28601 – ISO-8859-11 – Latin/Thai
28602 – ISO-8859-12 – reserved for Latin/Devanagari but abandoned (not supported)
28603 – ISO-8859-13 – Latin-7 or Baltic Rim
28604 – ISO-8859-14 – Latin-8 or Celtic
28605 – ISO-8859-15 – Latin-9
28606 – ISO-8859-16 – Latin-10 or South-Eastern European
38596 – ISO-8859-6-I – Latin/Arabic (logical bidirectional order)
38598 – ISO-8859-8-I – Latin/Hebrew (logical bidirectional order)

ITU-T code pages[edit]

20105 – 7-bit IA5 IRV (Western European)^[33]^[34]^[35]
20106 – 7-bit IA5 German (DIN 66003)^[33]^[34]^[36]
20107 – 7-bit IA5 Swedish (SEN 850200 C)^[33]^[34]^[37]
20108 – 7-bit IA5 Norwegian (NS 4551-2)^[33]^[34]^[38]
20127 – 7-bit US-ASCII^[33]^[34]^[39]
20261 – T.61 (T.61-8bit)
20269 – ISO-6937

KOI8 code pages[edit]

20866 – Russian – KOI8-R
21866 – Ukrainian – KOI8-U (or KOI8-RU in some versions)^[40]

Problems arising from the use of code pages[edit]

Microsoft strongly recommends using Unicode in modern applications, but many applications or data files still depend on the legacy code pages.

Programs need to know what code page to use in order to display the contents of (pre-Unicode) files correctly. If a program uses the wrong code page it may show text as mojibake.
The code page in use may differ between machines, so (pre-Unicode) files created on one machine may be unreadable on another.
Data is often improperly tagged with the code page, or not tagged at all, making determination of the correct code page to read the data difficult.
These Microsoft code pages differ to various degrees from some of the standards and other vendors’ implementations. This isn’t a Microsoft issue per se, as it happens to all vendors, but the lack of consistency makes interoperability with other systems unreliable in some cases.
The use of code pages limits the set of characters that may be used.
Characters expressed in an unsupported code page may be converted to question marks (?) or other replacement characters, or to a simpler version (such as removing accents from a letter). In either case, the original character may be lost.

References[edit]

^ «Code Pages». 2016-03-07. Archived from the original on 2016-03-07. Retrieved 2021-05-26.
^ ^a ^b ^c «Glossary of Terms Used on this Site». December 8, 2018. Archived from the original on 2018-12-08. The term «ANSI» as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. «ANSI applications» are usually a reference to non-Unicode or code page–based applications.
^ «Character Sets». www.iana.org. Archived from the original on 2021-05-25. Retrieved 2021-05-26.
^ «Extensible Markup Language (XML) 1.1 (Second Edition): Character encodings». W3C. 29 September 2006. Archived from the original on 19 April 2021. Retrieved 5 October 2020.
^ hylom (2017-11-14). «Windows 10のInsider PreviewでシステムロケールをUTF-8にするオプションが追加される» [The option to make UTF-8 the system locale added in Windows 10 Insider Preview]. スラド (in Japanese). Archived from the original on 2018-05-11. Retrieved 2018-05-10.
^ «Character Sets». IANA. Archived from the original on 2016-12-03. Retrieved 2019-04-07.
^ Microsoft. «Windows 1250». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01250». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1251». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01251». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1252». Archived from the original on 2013-05-04. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01252». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1253». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01253». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1254». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01254». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1255». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01255». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1256». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01256». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1257». Archived from the original on 2013-03-16. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01257». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1258». Archived from the original on 2013-10-25. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01258». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document — CPGID 00437». Archived from the original on 2016-06-09. Retrieved 2014-07-04.
^ «IBM-943 and IBM-932». IBM Knowledge Center. IBM. Archived from the original on 2018-08-18. Retrieved 2020-07-08.
^ «Converter Explorer: ibm-1373_P100-2002». ICU Demonstration. International Components for Unicode. Archived from the original on 2021-05-26. Retrieved 2020-06-27.
^ «Coded character set identifiers – CCSID 5471». IBM Globalization. IBM. Archived from the original on 2014-11-29.
^ Julliard, Alexandre. «dump_krwansung_codepage: build Korean Wansung table from the KSX1001 file». make_unicode: Generate code page .c files from ftp.unicode.org descriptions. Wine Project. Archived from the original on 2021-05-26. Retrieved 2021-03-14.
^ IBM. «SBCS code page information document — CPGID 00037». Archived from the original on 2014-07-14. Retrieved 2014-07-04.
^ Steele, Shawn (2005-09-12). «Code Page 21027 «Extended/Ext Alpha Lowercase»«. MSDN. Archived from the original on 2019-04-06. Retrieved 2019-04-06.
^ ^a ^b ^c ^d ^e «Code Page Identifiers». docs.microsoft.com. Archived from the original on 2019-04-07. Retrieved 2019-04-07.
^ ^a ^b ^c ^d ^e «Code Page Identifiers». Microsoft Developer Network. Microsoft. 2014. Archived from the original on 2016-06-19. Retrieved 2016-06-19.
^ ^a ^b ^c ^d ^e «Web Encodings — Internet Explorer — Encodings». WHATWG Wiki. 2012-10-23. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «Western European (IA5) encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «German (IA5) encoding – Windows charsets». WUtils.com – Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «Swedish (IA5) encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «Norwegian (IA5) encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «US-ASCII encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Nechayev, Valentin (2013) [2001]. «Review of 8-bit Cyrillic encodings universe». Archived from the original on 2016-12-05. Retrieved 2016-12-05.

External links[edit]

National Language Support (NLS) API Reference. Table showing ANSI and OEM codepages per language (from web-archive since Microsoft removed the original page)
IANA Charset Name Registrations
Unicode mapping table for Windows code pages
Unicode mappings of windows code pages with «best fit»

Источник

From Wikipedia, the free encyclopedia

ANSI code page[edit]

Windows-125x series

Alias(es)	ANSI (misnomer)
Standard	WHATWG Encoding Standard
Extends	US-ASCII
Preceded by	ISO 8859
Succeeded by	Unicode UTF-16 (in Win32 API)
v t e

Supersets of the standard sets such as those of ISO 8859 and the various national standards (like Windows-1252 vs. ISO-8859-1),
Major modifications of these (making them incompatible to various degrees, like Windows-1250 vs. ISO-8859-2)
Having no parallel encoding (like Windows-1257 vs. ISO-8859-4; ISO-8859-13 was introduced much later). Also, Windows-1251 follows neither the ISO-standardised ISO-8859-5 nor the then-prevailing KOI-8.

OEM code page[edit]

History[edit]

UTF-8, UTF-16[edit]

List[edit]

The following Windows code pages exist:

Windows-125x series[edit]

ID	Description	Relationship to ISO 8859 or other established encodings
1250^[7]^[8]	Latin 2 / Central European	Similar to ISO-8859-2 but moves several characters, including multiple letters.
1251^[9]^[10]	Cyrillic	Incompatible with both ISO-8859-5 and KOI-8.
1252^[11]^[12]	Latin 1 / Western European	Superset of ISO-8859-1 (without C1 controls). Letter repertoire accordingly similar to CP850.
1253^[13]^[14]	Greek	Similar to ISO 8859-7 but moves several characters, including a letter.
1254^[15]^[16]	Turkish	Superset of ISO 8859-9 (without C1 controls).
1255^[17]^[18]	Hebrew	Almost a superset of ISO 8859-8, but with two incompatible punctuation changes.
1256^[19]^[20]	Arabic	Not compatible with ISO 8859-6; rather, OEM Code page 708 is an ISO 8859-6 (ASMO 708) superset.
1257^[21]^[22]	Baltic	Not ISO 8859-4; the later ISO 8859-13 is closely related, but with some differences in available punctuation.
1258^[23]^[24]	Vietnamese (also OEM)	Not related to VSCII or VISCII, uses fewer base characters with combining diacritics.

DOS code pages[edit]

These are also ASCII-based. Most of these are included for use as OEM code pages; code page 874 is also used as an ANSI code page.

437 – IBM PC US, 8-bit SBCS extended ASCII.^[25] Known as OEM-US, the encoding of the primary built-in font of VGA graphics cards.
708 – Arabic, extended ISO 8859-6 (ASMO 708)
720 – Arabic, retaining box drawing characters in their usual locations
737 – «MS-DOS Greek». Retains all box drawing characters. More popular than 869.
775 – «MS-DOS Baltic Rim»
850 – «MS-DOS Latin 1». Full (re-arranged) repertoire of ISO 8859-1.
852 – «MS-DOS Latin 2»
855 – «MS-DOS Cyrillic». Mainly used for South Slavic languages. Includes (re-arranged) repertoire of ISO-8859-5. Not to be confused with cp866.
857 – «MS-DOS Turkish»
858 – Western European with euro sign
860 – «MS-DOS Portuguese»
861 – «MS-DOS Icelandic»
862 – «MS-DOS Hebrew»
863 – «MS-DOS French Canada»
864 – Arabic
865 – «MS-DOS Nordic»
866 – «MS-DOS Cyrillic Russian», cp866. Sole purely OEM code page (rather than ANSI or both) included as a legacy encoding in WHATWG Encoding Standard for HTML5.
869 – «MS-DOS Greek 2», IBM869. Full (re-arranged) repertoire of ISO 8859-7.
874 – Thai, also used as the ANSI code page, extends ISO 8859-11 (and therefore TIS-620) with a few additional characters from Windows-1252. Corresponds to IBM code page 1162 (IBM-874 is similar but has different extensions).

East Asian multi-byte code pages[edit]

ID	Language	Encoding	IBM Equivalent	Difference from IBM CCSID of same number	Use
932	Japanese	Shift JIS (Microsoft variant)	943^[26]	IBM-932 is also Shift JIS, has fewer extensions (but those extensions it has are in common), and swaps some variant Chinese characters (itaiji) for interoperability with earlier editions of JIS C 6226.	ANSI/OEM (Japan)
936	Chinese (simplified)	GBK	1386	IBM-936 is a different Simplified Chinese encoding with a different encoding method, which has been deprecated since 1993.	ANSI/OEM (PRC, Singapore)
949	Korean	Unified Hangul Code	1363	IBM-949 is also an EUC-KR superset, but with different (colliding) extensions.	ANSI/OEM (Republic of Korea)
950	Chinese (traditional)	Big5 (Microsoft variant)	1373^[27]	IBM-950 is also Big5, but includes a different subset of the ETEN extensions, adds further extensions with an expanded trail byte range, and lacks the Euro.	ANSI/OEM (Taiwan, Hong Kong)
951	Chinese (traditional) including Cantonese	Big5-HKSCS (2001 ed.)	5471^[28]	IBM-951 is the double-byte plane from IBM-949 (see above), and unrelated to Microsoft’s internal use of the number 951.	ANSI/OEM (Hong Kong, 98/NT4/2000/XP with HKSCS patch)

A few further multiple-byte code pages are supported for decoding or encoding using operating system libraries, but not used as either sort of system encoding in any locale.

ID	IBM Equivalent	Language	Encoding	Use
1361	—	Korean	Johab (KS C 5601-1992 annex 3)	Conversion
20000	—	Chinese (traditional)	An encoding of CNS 11643	Conversion
20001	—	Chinese (traditional)	TCA	Conversion
20002	—	Chinese (traditional)	Big5 (ETEN variant)	Conversion
20003	938	Chinese (traditional)	IBM 5550	Conversion
20004	—	Chinese (traditional)	Teletext	Conversion
20005	—	Chinese (traditional)	Wang	Conversion
20932	954 (roughly)	Japanese	EUC-JP	Conversion
20936	5479	Chinese (simplified)	GB 2312	Conversion
20949, 51949	970	Korean	Wansung (8-bit with ASCII, i.e. EUC-KR)^[29]	Conversion

EBCDIC code pages[edit]

37 – IBM EBCDIC US-Canada, 8-bit SBCS^[30]
500 – Latin 1
870 – IBM870
875 – cp875
1026 – EBCDIC Turkish
1047 – IBM01047 – Latin 1
1140 – IBM01141
1141 – IBM01141
1142 – IBM01142
1143 – IBM01143
1144 – IBM01144
1145 – IBM01145
1146 – IBM01146
1147 – IBM01147
1148 – IBM01148
1149 – IBM01149
20273 – EBCDIC Germany
20277 – EBCDIC Denmark/Norway
20278 – EBCDIC Finland/Sweden
20280 – EBCDIC Italy
20284 – EBCDIC Latin America/Spain
20285 – EBCDIC United Kingdom
20290 – EBCDIC Japanese
20297 – EBCDIC France
20420 – EBCDIC Arabic
20423 – EBCDIC Greek
20424 – x-EBCDIC-KoreanExtended
20833 – Korean
20838 – EBCDIC Thai
20924 – IBM00924 – IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
20871 – EBCDIC Icelandic
20880 – EBCDIC Cyrillic
20905 – EBCDIC Turkish
21025 – EBCDIC Cyrillic
21027 – Japanese EBCDIC (incomplete,^[31] deprecated)^[32]

[edit]

1200 – Unicode (BMP of ISO 10646, UTF-16LE). Available only to managed applications.^[32]
1201 – Unicode (UTF-16BE). Available only to managed applications.^[32]
12000 – UTF-32. Available only to managed applications.^[32]
12001 – UTF-32. Big-endian. Available only to managed applications.^[32]
65000 – Unicode (UTF-7)
65001 – Unicode (UTF-8)

Macintosh compatibility code pages[edit]

10000 – Apple Macintosh Roman
10001 – Apple Macintosh Japanese
10002 – Apple Macintosh Chinese (traditional) (BIG-5)
10003 – Apple Macintosh Korean
10004 – Apple Macintosh Arabic
10005 – Apple Macintosh Hebrew
10006 – Apple Macintosh Greek
10007 – Apple Macintosh Cyrillic
10008 – Apple Macintosh Chinese (simplified) (GB 2312)
10010 – Apple Macintosh Romanian
10017 – Apple Macintosh Ukrainian
10021 – Apple Macintosh Thai
10029 – Apple Macintosh Roman II / Central Europe
10079 – Apple Macintosh Icelandic
10081 – Apple Macintosh Turkish
10082 – Apple Macintosh Croatian

ISO 8859 code pages[edit]

28591 – ISO-8859-1 – Latin-1 (IBM equivalent: 819)
28592 – ISO-8859-2 – Latin-2
28593 – ISO-8859-3 – Latin-3 or South European
28594 – ISO-8859-4 – Latin-4 or North European
28595 – ISO-8859-5 – Latin/Cyrillic
28596 – ISO-8859-6 – Latin/Arabic
28597 – ISO-8859-7 – Latin/Greek
28598 – ISO-8859-8 – Latin/Hebrew
28599 – ISO-8859-9 – Latin-5 or Turkish
28600 – ISO-8859-10 – Latin-6
28601 – ISO-8859-11 – Latin/Thai
28602 – ISO-8859-12 – reserved for Latin/Devanagari but abandoned (not supported)
28603 – ISO-8859-13 – Latin-7 or Baltic Rim
28604 – ISO-8859-14 – Latin-8 or Celtic
28605 – ISO-8859-15 – Latin-9
28606 – ISO-8859-16 – Latin-10 or South-Eastern European
38596 – ISO-8859-6-I – Latin/Arabic (logical bidirectional order)
38598 – ISO-8859-8-I – Latin/Hebrew (logical bidirectional order)

ITU-T code pages[edit]

20105 – 7-bit IA5 IRV (Western European)^[33]^[34]^[35]
20106 – 7-bit IA5 German (DIN 66003)^[33]^[34]^[36]
20107 – 7-bit IA5 Swedish (SEN 850200 C)^[33]^[34]^[37]
20108 – 7-bit IA5 Norwegian (NS 4551-2)^[33]^[34]^[38]
20127 – 7-bit US-ASCII^[33]^[34]^[39]
20261 – T.61 (T.61-8bit)
20269 – ISO-6937

KOI8 code pages[edit]

20866 – Russian – KOI8-R
21866 – Ukrainian – KOI8-U (or KOI8-RU in some versions)^[40]

Problems arising from the use of code pages[edit]

Microsoft strongly recommends using Unicode in modern applications, but many applications or data files still depend on the legacy code pages.

Programs need to know what code page to use in order to display the contents of (pre-Unicode) files correctly. If a program uses the wrong code page it may show text as mojibake.
The code page in use may differ between machines, so (pre-Unicode) files created on one machine may be unreadable on another.
Data is often improperly tagged with the code page, or not tagged at all, making determination of the correct code page to read the data difficult.
These Microsoft code pages differ to various degrees from some of the standards and other vendors’ implementations. This isn’t a Microsoft issue per se, as it happens to all vendors, but the lack of consistency makes interoperability with other systems unreliable in some cases.
The use of code pages limits the set of characters that may be used.
Characters expressed in an unsupported code page may be converted to question marks (?) or other replacement characters, or to a simpler version (such as removing accents from a letter). In either case, the original character may be lost.

References[edit]

^ «Code Pages». 2016-03-07. Archived from the original on 2016-03-07. Retrieved 2021-05-26.
^ ^a ^b ^c «Glossary of Terms Used on this Site». December 8, 2018. Archived from the original on 2018-12-08. The term «ANSI» as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. «ANSI applications» are usually a reference to non-Unicode or code page–based applications.
^ «Character Sets». www.iana.org. Archived from the original on 2021-05-25. Retrieved 2021-05-26.
^ «Extensible Markup Language (XML) 1.1 (Second Edition): Character encodings». W3C. 29 September 2006. Archived from the original on 19 April 2021. Retrieved 5 October 2020.
^ hylom (2017-11-14). «Windows 10のInsider PreviewでシステムロケールをUTF-8にするオプションが追加される» [The option to make UTF-8 the system locale added in Windows 10 Insider Preview]. スラド (in Japanese). Archived from the original on 2018-05-11. Retrieved 2018-05-10.
^ «Character Sets». IANA. Archived from the original on 2016-12-03. Retrieved 2019-04-07.
^ Microsoft. «Windows 1250». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01250». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1251». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01251». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1252». Archived from the original on 2013-05-04. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01252». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1253». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01253». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1254». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01254». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1255». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01255». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1256». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01256». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1257». Archived from the original on 2013-03-16. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01257». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ Microsoft. «Windows 1258». Archived from the original on 2013-10-25. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document CPGID 01258». Archived from the original on 2014-07-14. Retrieved 2014-07-06.
^ IBM. «SBCS code page information document — CPGID 00437». Archived from the original on 2016-06-09. Retrieved 2014-07-04.
^ «IBM-943 and IBM-932». IBM Knowledge Center. IBM. Archived from the original on 2018-08-18. Retrieved 2020-07-08.
^ «Converter Explorer: ibm-1373_P100-2002». ICU Demonstration. International Components for Unicode. Archived from the original on 2021-05-26. Retrieved 2020-06-27.
^ «Coded character set identifiers – CCSID 5471». IBM Globalization. IBM. Archived from the original on 2014-11-29.
^ Julliard, Alexandre. «dump_krwansung_codepage: build Korean Wansung table from the KSX1001 file». make_unicode: Generate code page .c files from ftp.unicode.org descriptions. Wine Project. Archived from the original on 2021-05-26. Retrieved 2021-03-14.
^ IBM. «SBCS code page information document — CPGID 00037». Archived from the original on 2014-07-14. Retrieved 2014-07-04.
^ Steele, Shawn (2005-09-12). «Code Page 21027 «Extended/Ext Alpha Lowercase»«. MSDN. Archived from the original on 2019-04-06. Retrieved 2019-04-06.
^ ^a ^b ^c ^d ^e «Code Page Identifiers». docs.microsoft.com. Archived from the original on 2019-04-07. Retrieved 2019-04-07.
^ ^a ^b ^c ^d ^e «Code Page Identifiers». Microsoft Developer Network. Microsoft. 2014. Archived from the original on 2016-06-19. Retrieved 2016-06-19.
^ ^a ^b ^c ^d ^e «Web Encodings — Internet Explorer — Encodings». WHATWG Wiki. 2012-10-23. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «Western European (IA5) encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «German (IA5) encoding – Windows charsets». WUtils.com – Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «Swedish (IA5) encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «Norwegian (IA5) encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Foller, Antonin (2014) [2011]. «US-ASCII encoding — Windows charsets». WUtils.com — Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
^ Nechayev, Valentin (2013) [2001]. «Review of 8-bit Cyrillic encodings universe». Archived from the original on 2016-12-05. Retrieved 2016-12-05.

External links[edit]

National Language Support (NLS) API Reference. Table showing ANSI and OEM codepages per language (from web-archive since Microsoft removed the original page)
IANA Charset Name Registrations
Unicode mapping table for Windows code pages
Unicode mappings of windows code pages with «best fit»

Источник

^{№ П/п.}	^HEX	^СИМВОЛ	^{№ П/п.}	^HEX	^СИМВОЛ	^{№ П/п.}	^HEX	^СИМВОЛ
⁰⁰⁰	⁰⁰	^NOP	⁰⁸⁶	⁵⁶	^V	¹⁷¹	^AB	^«
⁰⁰¹	⁰¹	^SOH	⁰⁸⁷	⁵⁷	^W	¹⁷²	^AC	^¬
⁰⁰²	⁰²	^STX	⁰⁸⁸	⁵⁸	^X	¹⁷³	^AD
⁰⁰³	⁰³	^ETX	⁰⁸⁹	⁵⁹	^Y	¹⁷⁴	^AE	^®
⁰⁰⁴	⁰⁴	^EOT	⁰⁹⁰	^5A	^Z	¹⁷⁵	^AF	^Ї
⁰⁰⁵	⁰⁵	^ENQ	⁰⁹¹	^5B	^[	¹⁷⁶	^B0	^°
⁰⁰⁶	⁰⁶	^ACK	⁰⁹²	^5C	^\	¹⁷⁷	^B1	^±
⁰⁰⁷	⁰⁷	^BEL	⁰⁹³	^5D	^]	¹⁷⁸	^B2	^І
⁰⁰⁸	⁰⁸	^BS	⁰⁹⁴	^5E	^{^}	¹⁷⁹	^B3	^і
⁰⁰⁹	⁰⁹	^TAB	⁰⁹⁵	^5F	^_	¹⁸⁰	^B4	^ґ
⁰¹⁰	^0A	^LF	⁰⁹⁶	⁶⁰	^`	¹⁸¹	^B5	^µ
⁰¹¹	^0B	^VT	⁰⁹⁷	⁶¹	^a	¹⁸²	^B6	^¶
⁰¹²	^0C	^FF	⁰⁹⁸	⁶²	^b	¹⁸³	^B7	^·
⁰¹³	^0D	^CR	⁰⁹⁹	⁶³	^c	¹⁸⁴	^B8	^Е
⁰¹⁴	^0E	^SO	¹⁰⁰	⁶⁴	^d	¹⁸⁵	^B9	^№
⁰¹⁵	^0F	^SI	¹⁰¹	⁶⁵	^e	¹⁸⁶	^BA	^Є
⁰¹⁶	¹⁰	^DLE	¹⁰²	⁶⁶	^f	¹⁸⁷	^BB	^»
⁰¹⁷	¹¹	^DC1	¹⁰³	⁶⁷	^g	¹⁸⁸	^BC	^ј
⁰¹⁸	¹²	^DC2	¹⁰⁴	⁶⁸	^h	¹⁸⁹	^BD	^Ѕ
⁰¹⁹	¹³	^DC3	¹⁰⁵	⁶⁹	ⁱ	¹⁹⁰	^BE	^Ѕ
⁰²⁰	¹⁴	^DC4	¹⁰⁶	^6A	^j	¹⁹¹	^BF	^Ї
⁰²¹	¹⁵	^NAK	¹⁰⁷	^6B	^k	¹⁹²	^C0	^А
⁰²²	¹⁶	^SYN	¹⁰⁸	^6C	^l	¹⁹³	^C1	^Б
⁰²³	¹⁷	^ETB	¹⁰⁹	^6D	^m	¹⁹⁴	^C2	^В
⁰²⁴	¹⁸	^CAN	¹¹⁰	^6E	ⁿ	¹⁹⁵	^C3	^Г
⁰²⁵	¹⁹	^EM	¹¹¹	^6F	^o	¹⁹⁶	^C4	^Д
⁰²⁶	^1A	^SUB	¹¹²	⁷⁰	^p	¹⁹⁷	^C5	^Е
⁰²⁷	^1B	^ESC	¹¹³	⁷¹	^q	¹⁹⁸	^C6	^Ж
⁰²⁸	^1C	^FS	¹¹⁴	⁷²	^r	¹⁹⁹	^C7	^З
⁰²⁹	^1D	^GS	¹¹⁵	⁷³	^s	²⁰⁰	^C8	^И
⁰³⁰	^1E	^RS	¹¹⁶	⁷⁴	^t	²⁰¹	^C9	^Й
⁰³¹	^1F	^US	¹¹⁷	⁷⁵	^u	²⁰²	^CA	^К
⁰³²	²⁰	^Пробел	¹¹⁸	⁷⁶	^v	²⁰³	^CB	^Л
⁰³³	²¹	^!	¹¹⁹	⁷⁷	^w	²⁰⁴	^CC	^М
⁰³⁴	²²	^«	¹²⁰	⁷⁸	^x	²⁰⁵	^CD	^Н
⁰³⁵	²³	^#	¹²¹	⁷⁹	^y	²⁰⁶	^CE	^О
⁰³⁶	²⁴	^$	¹²²	^7A	^z	²⁰⁷	^CF	^П
⁰³⁷	²⁵	^%	¹²³	^7B	^{	²⁰⁸	^D0	^Р
⁰³⁸	²⁶	^&	¹²⁴	^7C	^\|	²⁰⁹	^D1	^С
⁰³⁹	²⁷	^‘	¹²⁵	^7D	^}	²¹⁰	^D2	^Т
⁰⁴⁰	²⁸	⁽	¹²⁶	^7E	^~	²¹¹	^D3	^У
⁰⁴¹	²⁹	⁾	¹²⁷	^7F	²¹²	^D4	^Ф
⁰⁴²	^2A	^*	¹²⁸	⁸⁰	^Ђ	²¹³	^D5	^Х
⁰⁴³	^2B	⁺	¹²⁹	⁸¹	^Ѓ	²¹⁴	^D6	^Ц
⁰⁴⁴	^2C	^,	¹³⁰	⁸²	^‚	²¹⁵	^D7	^Ч
⁰⁴⁵	^2D	^—	¹³¹	⁸³	^ѓ	²¹⁶	^D8	^Ш
⁰⁴⁶	^2E	^.	¹³²	⁸⁴	^„	²¹⁷	^D9	^Щ
⁰⁴⁷	^2F	^/	¹³³	⁸⁵	^…	²¹⁸	^DA	^Ъ
⁰⁴⁸	³⁰	⁰	¹³⁴	⁸⁶	^†	²¹⁹	^DB	^Ы
⁰⁴⁹	³¹	¹	¹³⁵	⁸⁷	^‡	²²⁰	^DC	^Ь
⁰⁵⁰	³²	²	¹³⁶	⁸⁸	^€	²²¹	^DD	^Э
⁰⁵¹	³³	³	¹³⁷	⁸⁹	^‰	²²²	^DE	^Ю
⁰⁵²	³⁴	⁴	¹³⁸	^8A	^Љ	²²³	^DF	^Я
⁰⁵³	³⁵	⁵	¹³⁹	^8B	^‹	²²⁴	^E0	^а
⁰⁵⁴	³⁶	⁶	¹⁴⁰	^8C	^Њ	²²⁵	^E1	^б
⁰⁵⁵	³⁷	⁷	¹⁴¹	^8D	^Ќ	²²⁶	^E2	^в
⁰⁵⁶	³⁸	⁸	¹⁴²	^8E	^Ћ	²²⁷	^E3	^г
⁰⁵⁷	³⁹	⁹	¹⁴³	^8F	^Џ	²²⁸	^E4	^д
⁰⁵⁸	^3A	^:	¹⁴⁴	⁹⁰	^Ђ	²²⁹	^E5	^е
⁰⁵⁹	^3B	^;	¹⁴⁵	⁹¹	^‘	²³⁰	^E6	^ж
⁰⁶⁰	^3C	^<	¹⁴⁶	⁹²	^’	²³¹	^E7	^з
⁰⁶¹	^3D	⁼	¹⁴⁷	⁹³	^“	²³²	^E8	^и
⁰⁶²	^3E	^>	¹⁴⁸	⁹⁴	^”	²³³	^E9	^й
⁰⁶³	^3F	^?	¹⁴⁹	⁹⁵	^•	²³⁴	^EA	^к
⁰⁶⁴	⁴⁰	^@	¹⁵⁰	⁹⁶	^–	²³⁵	^EB	^л
⁰⁶⁵	⁴¹	^A	¹⁵¹	⁹⁷	^—	²³⁶	^EC	^м
⁰⁶⁶	⁴²	^B	¹⁵²	⁹⁸	²³⁷	^ED	^н
⁰⁶⁷	⁴³	^C	¹⁵³	⁹⁹	^™	²³⁸	^EE	^о
⁰⁶⁸	⁴⁴	^D	¹⁵⁴	^9A	^љ	²³⁹	^EF	^п
⁰⁶⁹	⁴⁵	^E	¹⁵⁵	^9B	^›	²⁴⁰	^F0	^р
⁰⁷⁰	⁴⁶	^F	¹⁵⁶	^9C	^њ	²⁴¹	^F1	^с
⁰⁷¹	⁴⁷	^G	¹⁵⁷	^9D	^ќ	²⁴²	^F2	^т
⁰⁷²	⁴⁸	^H	¹⁵⁸	^9E	^ћ	²⁴³	^F3	^у
⁰⁷³	⁴⁹	^I	¹⁵⁹	^9F	^џ	²⁴⁴	^F4	^ф
⁰⁷⁴	^4A	^J	¹⁶⁰	^A0	²⁴⁵	^F5	^х
⁰⁷⁵	^4B	^K	¹⁶¹	^A1	^Ў	²⁴⁶	^F6	^ц
⁰⁷⁶	^4C	^L	¹⁶²	^A2	^ў	²⁴⁷	^F7	^ч
⁰⁷⁷	^4D	^M	¹⁶³	^A3	^Ј	²⁴⁸	^F8	^ш
⁰⁷⁸	^4E	^N	¹⁶⁴	^A4	^¤	²⁴⁹	^F9	^щ
⁰⁷⁹	^4F	^O	¹⁶⁵	^A5	^Ґ	²⁵⁰	^FA	^ъ
⁰⁸⁰	⁵⁰	^P	¹⁶⁶	^A6	^¦	²⁵¹	^FB	^ы
⁰⁸¹	⁵¹	^Q	¹⁶⁷	^A7	^§	²⁵²	^FC	^ь
⁰⁸²	⁵²	^R	¹⁶⁸	^A8	^Е	²⁵³	^FD	^э
⁰⁸³	⁵³	^S	¹⁶⁹	^A9	^©	²⁵⁴	^FE	^ю
⁰⁸⁴	⁵⁴	^T	¹⁷⁰	^AA	^Є	²⁵⁵	^FF	^я
⁰⁸⁵	⁵⁵	^U

Кодировки: полезная информация и краткая ретроспектива

Минуточку внимания

Общее определение кодировки

Богатство разнообразия кодовых таблиц

Некорректное отображение символов

Кириллическая таблица ANSI-кодировок, Windows-1251

Подводя итоги

ANSI code page[edit]

OEM code page[edit]

History[edit]

UTF-8, UTF-16[edit]

List[edit]

Windows-125x series[edit]

DOS code pages[edit]

East Asian multi-byte code pages[edit]

EBCDIC code pages[edit]

[edit]

Macintosh compatibility code pages[edit]

ISO 8859 code pages[edit]

ITU-T code pages[edit]

KOI8 code pages[edit]

Problems arising from the use of code pages[edit]

See also[edit]

References[edit]

External links[edit]

ANSI code page[edit]

OEM code page[edit]

History[edit]

UTF-8, UTF-16[edit]

List[edit]

Windows-125x series[edit]

DOS code pages[edit]

East Asian multi-byte code pages[edit]

EBCDIC code pages[edit]

[edit]

Macintosh compatibility code pages[edit]

ISO 8859 code pages[edit]

ITU-T code pages[edit]

KOI8 code pages[edit]

Problems arising from the use of code pages[edit]

See also[edit]

References[edit]

External links[edit]

Другие наши интересноые статьи: