Not all characters are representable in cyrillic windows 1251 falling back to utf 8

I’m using Sublime Text for Latex, so i need to use a specific encoding. However, in some cases, when I paste text copied from a different program (word/browser in most cases), I’m getting the message:

"Not all characters are representable in XXX encoding, falling back to UTF-8"

My question is: Is there any way to see which parts of the text cannot be encoded, so I can delete them manually?

Volker E.

5,91911 gold badges48 silver badges64 bronze badges

asked Sep 9, 2014 at 10:00

I had this problem. It is caused by corrupt characters in your document. Here is how i solved it.

1) Make a search in your document for all standard characters. Make sure you enable regular expressions in your search, then paste this :

[^a-zA-Z0-9 -\.;<>/ ={}\[\]\^\?_\\\|:\r\n@]

You can add to that the normal accented characters of your language, here are the characters for French and German. Such as éà and so on :

[^a-zA-Z0-9 -\.;<>/ ='{}\[\]\^\?_\\\|:\r\n~@éàèêîôâûçäöüÄÖÜß]

2) Search for that, and Keep pressing F3 until you see mangled characters. Usually something like «Ã¨» which is a corrupt version of «à».

3) Delete those characters or replace them with what they should be.

You will be able to convert the document to another encoding when you have cleared all corrupt characters out.

answered Feb 15, 2015 at 19:41

DrakenDraken

2802 silver badges7 bronze badges

For Linux users, it’s also possible to automatically remove broken characters with command iconv:

iconv -f UTF-8 -t Windows-1251 -c < ~/temp/data.csv > ~/temp/data01.csv

-c Silently discard characters that cannot be converted instead of terminating when encountering such characters.

answered Dec 7, 2018 at 11:30

Just adding to @Draken response: here is the RegEx with spanish characters added.

[^a-zA-Z0-9 -\.;<>/ =“”'{}\[\]\^\?_\\\|:\r\n~@àèêîôâûçäöüÄÖÜßáéíóúñÑ¿€]

In my case I hitted Ctrl+H (for replacement) and as a replacement expression used nothing. So everything got cleared super fast and I was able to save it using ISO-8859-1.

halfer

19.9k17 gold badges102 silver badges189 bronze badges

answered Nov 17, 2017 at 18:46

Источник

Problem description

I have a file in UTF.
I need to save it in win1251. So, I press:
File — save with Encoding — Cyrillic (windows 1251)
But I get a modal window with error: «Not all characters are representable in Cyrillic (windows 1251), falling back to UTF-8»

But I need to save it.

Preferred solution

Save anyway.
Any bad characters (not representable) must be replaced with «?» or smth.

Alternatives

Save anyway.
Might be it is good to ask user $array1 and $array2,
and Sublime will applay, say, str_replace($array1,$array2,$_TEXT)
With the possibility to skip this and replace all bad characters with «?» without asking.

Additional Information

Источник

Печать

Страницы: [1] 2 Все Вниз

Тема: Чем перекодировать текст из UTF-8 в Windows-1251 ? (Прочитано 15327 раз)

0 Пользователей и 1 Гость просматривают эту тему.

Givizub

Чем и как перекодировать текстовый файл из UTF-8 в Windows-1251?
В Gedit при попытке сохранения пишет:
Документ содержит один или более символов, которые не имеют представления в указанной кодировке символов. Выберите другую кодировку и попробуйте еще раз.
Или этот файл совсем нельзя сохранить в Windows-1251?

Phlya

Если есть символы, которых в целевой кодировке в принципе нет, то ничего не сделаешь, кроме как заменить их.

Ubuntu 14.04 (Unity), MSI GE40

Haron Prime

я пользуюсь sublime-text-2 — никогда не возникало проблем со сменой кодировки

Givizub

Not all characters are representable in Cyrillic (Windows 1251), falling back to UTF-8

Subline Text 2 тоже не хочет. (Хотя, текстовый редактор хороший)
Очевидно, есть символы которые не кодируются.

Haron Prime

возможно, что я с такими просто не сталкивался, т.к. у меня подобное не происходило ни разу

*а редактор не просто хороший, а замечательный
лично для меня он оказался идеальной заменой notepad++
советую познакомиться с ним поближе — не пожалеете

adawdp

— А если „старичка“ Штирлица IV попробовать у меня в VirtualBox есть и под Wine он по-моему работает я как-то запускал из Ubuntu находящийся на Windows диске Штирлиц.
— UTF и 7 и 8 там есть…
— Штирлиц по-моему портативный от рождения…

Дмитрий Бо

Givizub

iconv?

іconv: illegal input sequence at position 92726
Весь текст в виде «��».

victor00000

Drone93,

man iconv

L~$ echo "Привет" | iconv -f utf-8 -t windows-1251 > t.t L~$ cat t.t �� L~$ L~$ cat t.t | iconv -t utf-8 -f windows-1251 Привет L~$

Givizub

Делаю так же, пишет «iconv: illegal input sequence at position 92726«
Значит, что в windows-1251 нету такого символа.

Разве что, найти этот 92726-й символ.

ArcFi

echo '123☃456' | iconv -f UTF-8 -t CP1251//IGNORE

Samael

Ubuntu 8.04 @ Dell Inspiron 1501 -> Mint 14 @ Acer AspireOne 722

Givizub

А тексто

echo '123☃456' | iconv -f UTF-8 -t CP1251//IGNORE

iconv -f UTF-8 -t CP1251//IGNORE Мой-документ.txt Много значков �� iconv: illegal input sequence at position 103541Текстовый документ очень длинный (Электронная книга, там до сотни страниц!). В терминале полностью не отображается.

victor00000

Drone93,
-f и -t обмен на -t и -f ?

Givizub

А что говорит enca?

Ничего не сказала, или я не правильно что-то сделал (windows1251 и CP1251 — одно и то же?) :

~/Документы/Книжки$ enconv -L ukrainian -x CP1251 *.txt

Пользователь решил продолжить мысль 01 Мая 2013, 21:31:53:

Drone93,
-f и -t обмен на -t и -f ?

Выходит «РЅРµРЅР°РґРѕРІРіРѕ» и текст в терминале отображается не весь. Нужно, чтобы его сохраняло в файл.

« Последнее редактирование: 01 Мая 2013, 21:31:53 от Drone93 »

R.I.P. — Unity

Печать

Страницы: [1] 2 Все Вверх

Источник

У меня была эта проблема. Это вызвано повреждением символов в вашем документе. Вот как я это решил.

1) Сделайте поиск в вашем документе по всем стандартным символам. Убедитесь, что вы включили регулярные выражения в вашем поиске, а затем вставьте это:

[^a-zA-Z0-9 -\.;<>/ ={}\[\]\^\?_\\\|:\r\n@]

Вы можете добавить к этому обычные акцентированные символы вашего языка, вот символы для французского и немецкого языков. Таких как и так далее:

[^a-zA-Z0-9 -\.;<>/ ='{}\[\]\^\?_\\\|:\r\n~@éàèêîôâûçäöüÄÖÜß]

2) Найдите это и продолжайте нажимать F3, пока не увидите искаженные символы. Обычно что-то вроде «¨¨», которое является искаженной версией «à».

3) Удалите эти символы или замените их такими, какими они должны быть.

Вы сможете преобразовать документ в другую кодировку после удаления всех поврежденных символов.

Источник

"Not all characters are representable in XXX encoding, falling back to UTF-8"

My question is: Is there any way to see which parts of the text cannot be encoded, so I can delete them manually?

I had this problem. It is caused by corrupt characters in your document. Here is how i solved it.

1) Make a search in your document for all standard characters. Make sure you enable regular expressions in your search, then paste this :

[^a-zA-Z0-9 -\.;<>/ ={}\[\]\^\?_\\\|:\r\n@]

You can add to that the normal accented characters of your language, here are the characters for French and German. Such as éà and so on :

[^a-zA-Z0-9 -\.;<>/ ='{}\[\]\^\?_\\\|:\r\n~@éàèêîôâûçäöüÄÖÜß]

2) Search for that, and Keep pressing F3 until you see mangled characters. Usually something like «Ã¨» which is a corrupt version of «à».

3) Delete those characters or replace them with what they should be.

You will be able to convert the document to another encoding when you have cleared all corrupt characters out.

For Linux users, it’s also possible to automatically remove broken characters with command iconv:

iconv -f UTF-8 -t Windows-1251 -c < ~/temp/data.csv > ~/temp/data01.csv

-c Silently discard characters that cannot be converted instead of terminating when encountering such characters.

Just adding to @Draken response: here is the RegEx with spanish characters added.

[^a-zA-Z0-9 -\.;<>/ =“”'{}\[\]\^\?_\\\|:\r\n~@àèêîôâûçäöüÄÖÜßáéíóúñÑ¿€]

In my case I hitted Ctrl+H (for replacement) and as a replacement expression used nothing. So everything got cleared super fast and I was able to save it using ISO-8859-1.

Источник

Problem description

Preferred solution

Alternatives

Additional Information

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/topic/hot_post_locked.png" /> Тема: Чем перекодировать текст из UTF-8 в Windows-1251 ? (Прочитано 15327 раз)

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Givizub

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Phlya

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Haron Prime

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Givizub

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Haron Prime

adawdp

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Дмитрий Бо

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Givizub

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> victor00000

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Givizub

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> ArcFi

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Samael

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Givizub

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> victor00000

<img decoding="async" src="https://forum.ubuntu.ru/Themes/ubuntu-portal/images/png/useroff.png" alt="Оффлайн" /> Givizub

Другие наши интересноые статьи:

Тема: Чем перекодировать текст из UTF-8 в Windows-1251 ? (Прочитано 15327 раз)

Givizub

Phlya

Haron Prime

Givizub

Haron Prime

Дмитрий Бо

Givizub

victor00000

Givizub

ArcFi

Samael

Givizub

victor00000

Givizub