Перекодировать из windows 1251 в utf 8 javascript

I need to convert a string from Windows-1251 to UTF-8.

I tried to do this with iconv, but all I get is something like this:

пїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ

var iconv = new Iconv('windows-1251', 'utf-8')
title = iconv.convert(title).toString('utf-8')

Pang's user avatar

Pang

9,622146 gold badges81 silver badges122 bronze badges

asked Jan 1, 2012 at 13:52

user1125115's user avatar

1

Here is working solution to your problem. You have to use Buffer and convert your string to binary first.

const Iconv = require('iconv').Iconv;

request({ 
    uri: website_url,
    method: 'GET',
    encoding: 'binary'
}, function (error, response, body) {

        const body = new Buffer(body, 'binary');
        conv = Iconv('windows-1251', 'utf8');
        body = conv.convert(body).toString();

});

Ahmet Şimşek's user avatar

Ahmet Şimşek

1,4111 gold badge15 silver badges24 bronze badges

answered Jan 29, 2012 at 0:20

Alex Kolarski's user avatar

Alex KolarskiAlex Kolarski

3,2651 gold badge25 silver badges35 bronze badges

1

If you’re reading from file, you could use something like that:

const iconv = require('iconv-lite');
const fs = require("fs");

fs.readFile("filename.xml", null, (err, data) => { 
    if(err) { 
        console.log(err)
        return
    }

    const encodedData = iconv.encode(iconv.decode(data, 'win1251'), 'utf8')
    fs.writeFile("result_filename.xml", encodedData, () => { })
})

answered Jul 14, 2021 at 18:27

Konstantin Nikolskii's user avatar

I use Node version 16 and code bellow works fine. You don’t need to use Buffer node will write warnings. You need to install iconv package before.

        fs = require('fs')
        fs.readFile('printed_document.txt', function (err,data) {
            if (err) {
                return console.log(err);
            }
            console.log(require('iconv').Iconv('windows-1251', 'utf-8').convert(data).toString())
        })

answered Oct 13, 2022 at 13:44

Orlov Const's user avatar

Orlov ConstOrlov Const

3323 silver badges10 bronze badges

Using regular expressions is probably the best way. You can see a bunch of tests here (taken from chromium)

function validateEmail(email) {
    const re = /^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
    return re.test(String(email).toLowerCase());
}

Here’s the example of regular expresion that accepts unicode:

const re = /^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i;

But keep in mind that one should not rely only upon JavaScript validation. JavaScript can easily be disabled. This should be validated on the server side as well.

Here’s an example of the above in action:

function validateEmail(email) {
  const re = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
  return re.test(email);
}

function validate() {
  const $result = $("#result");
  const email = $("#email").val();
  $result.text("");

  if (validateEmail(email)) {
    $result.text(email + " is valid :)");
    $result.css("color", "green");
  } else {
    $result.text(email + " is not valid :(");
    $result.css("color", "red");
  }
  return false;
}

$("#email").on("input", validate);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<label for=email>Enter an email address:</label>
<input id="email">
<h2 id="result"></h2>

A closure is a pairing of:

  1. A function, and
  2. A reference to that function’s outer scope (lexical environment)

A lexical environment is part of every execution context (stack frame) and is a map between identifiers (ie. local variable names) and values.

Every function in JavaScript maintains a reference to its outer lexical environment. This reference is used to configure the execution context created when a function is invoked. This reference enables code inside the function to «see» variables declared outside the function, regardless of when and where the function is called.

If a function was called by a function, which in turn was called by another function, then a chain of references to outer lexical environments is created. This chain is called the scope chain.

In the following code, inner forms a closure with the lexical environment of the execution context created when foo is invoked, closing over variable secret:

function foo() {
  const secret = Math.trunc(Math.random()*100)
  return function inner() {
    console.log(`The secret number is ${secret}.`)
  }
}
const f = foo() // `secret` is not directly accessible from outside `foo`
f() // The only way to retrieve `secret`, is to invoke `f`

In other words: in JavaScript, functions carry a reference to a private «box of state», to which only they (and any other functions declared within the same lexical environment) have access. This box of the state is invisible to the caller of the function, delivering an excellent mechanism for data-hiding and encapsulation.

And remember: functions in JavaScript can be passed around like variables (first-class functions), meaning these pairings of functionality and state can be passed around your program: similar to how you might pass an instance of a class around in C++.

If JavaScript did not have closures, then more states would have to be passed between functions explicitly, making parameter lists longer and code noisier.

So, if you want a function to always have access to a private piece of state, you can use a closure.

…and frequently we do want to associate the state with a function. For example, in Java or C++, when you add a private instance variable and a method to a class, you are associating state with functionality.

In C and most other common languages, after a function returns, all the local variables are no longer accessible because the stack-frame is destroyed. In JavaScript, if you declare a function within another function, then the local variables of the outer function can remain accessible after returning from it. In this way, in the code above, secret remains available to the function object inner, after it has been returned from foo.

Uses of Closures

Closures are useful whenever you need a private state associated with a function. This is a very common scenario — and remember: JavaScript did not have a class syntax until 2015, and it still does not have a private field syntax. Closures meet this need.

Private Instance Variables

In the following code, the function toString closes over the details of the car.

function Car(manufacturer, model, year, color) {
  return {
    toString() {
      return `${manufacturer} ${model} (${year}, ${color})`
    }
  }
}
const car = new Car('Aston Martin','V8 Vantage','2012','Quantum Silver')
console.log(car.toString())

Functional Programming

In the following code, the function inner closes over both fn and args.

function curry(fn) {
  const args = []
  return function inner(arg) {
    if(args.length === fn.length) return fn(...args)
    args.push(arg)
    return inner
  }
}

function add(a, b) {
  return a + b
}

const curriedAdd = curry(add)
console.log(curriedAdd(2)(3)()) // 5

Event-Oriented Programming

In the following code, function onClick closes over variable BACKGROUND_COLOR.

const $ = document.querySelector.bind(document)
const BACKGROUND_COLOR = 'rgba(200,200,242,1)'

function onClick() {
  $('body').style.background = BACKGROUND_COLOR
}

$('button').addEventListener('click', onClick)
<button>Set background color</button>

Modularization

In the following example, all the implementation details are hidden inside an immediately executed function expression. The functions tick and toString close over the private state and functions they need to complete their work. Closures have enabled us to modularise and encapsulate our code.

let namespace = {};

(function foo(n) {
  let numbers = []
  function format(n) {
    return Math.trunc(n)
  }
  function tick() {
    numbers.push(Math.random() * 100)
  }
  function toString() {
    return numbers.map(format)
  }
  n.counter = {
    tick,
    toString
  }
}(namespace))

const counter = namespace.counter
counter.tick()
counter.tick()
console.log(counter.toString())

Examples

Example 1

This example shows that the local variables are not copied in the closure: the closure maintains a reference to the original variables themselves. It is as though the stack-frame stays alive in memory even after the outer function exits.

function foo() {
  let x = 42
  let inner  = function() { console.log(x) }
  x = x+1
  return inner
}
var f = foo()
f() // logs 43

Example 2

In the following code, three methods log, increment, and update all close over the same lexical environment.

And every time createObject is called, a new execution context (stack frame) is created and a completely new variable x, and a new set of functions (log etc.) are created, that close over this new variable.

function createObject() {
  let x = 42;
  return {
    log() { console.log(x) },
    increment() { x++ },
    update(value) { x = value }
  }
}

const o = createObject()
o.increment()
o.log() // 43
o.update(5)
o.log() // 5
const p = createObject()
p.log() // 42

Example 3

If you are using variables declared using var, be careful you understand which variable you are closing over. Variables declared using var are hoisted. This is much less of a problem in modern JavaScript due to the introduction of let and const.

In the following code, each time around the loop, a new function inner is created, which closes over i. But because var i is hoisted outside the loop, all of these inner functions close over the same variable, meaning that the final value of i (3) is printed, three times.

function foo() {
  var result = []
  for (var i = 0; i < 3; i++) {
    result.push(function inner() { console.log(i) } )
  }
  return result
}

const result = foo()
// The following will print `3`, three times...
for (var i = 0; i < 3; i++) {
  result[i]() 
}

Final points:

  • Whenever a function is declared in JavaScript closure is created.
  • Returning a function from inside another function is the classic example of closure, because the state inside the outer function is implicitly available to the returned inner function, even after the outer function has completed execution.
  • Whenever you use eval() inside a function, a closure is used. The text you eval can reference local variables of the function, and in the non-strict mode, you can even create new local variables by using eval('var foo = …').
  • When you use new Function(…) (the Function constructor) inside a function, it does not close over its lexical environment: it closes over the global context instead. The new function cannot reference the local variables of the outer function.
  • A closure in JavaScript is like keeping a reference (NOT a copy) to the scope at the point of function declaration, which in turn keeps a reference to its outer scope, and so on, all the way to the global object at the top of the scope chain.
  • A closure is created when a function is declared; this closure is used to configure the execution context when the function is invoked.
  • A new set of local variables is created every time a function is called.

Links

  • Douglas Crockford’s simulated private attributes and private methods for an object, using closures.
  • A great explanation of how closures can cause memory leaks in IE if you are not careful.
  • MDN documentation on JavaScript Closures.

Время на прочтение
5 мин

Количество просмотров 22K

Есть у меня старый сайт на Народ.Ру, и недавно я закинул туда несколько статей — как это я теперь делаю в UTF-8. Кодировка была указана в теге meta, но, взглянув на страницы, я увидел крякозябры: «Р§С‚Рѕ-то случилось.» Оказывается, Народ.Ру шлёт HTTP-заголовок Content-Type: text/html; charset=windows-1251 и это на нём никак не отключается. Пользователь может получить читабельный текст — только если догадается вручную переключить кодировку в браузере.

Что делать? Переходить на другой хостинг? Само собой, но пока руки не дошли, хотелось добиться результата тут. Перекодировать тексты? Более достойным и интересным показалось поставить Javascript-«заплатку».

Способа переключить кодировку из Javascript я не нашёл. Остался вариант перекодировать текст скриптом, запускаемым по событию onready документа.

Итак, браузер получает текст в UTF-8, разбивает UTF-последовательности на группы по 8 бит и трактует их как коды символов в кодировке Windows-1251. Чтобы восстановить читаемость текста, нужно получить эти коды, объединить их в UTF-последовательности, а из них — восстановить Unicode-коды символов и вернуть последние посредством числовых ссылок HTML на символы. В этом деле обнаружились несколько закавык.

Во-первых, считывая текст из свойства innerHTML, мы обнаруживаем на месте неразрывного пробела (0xA0) HTML-сущность «&nbsp;». Нужно её заменять обратно на 0xA0.

Во-вторых, функция charCodeAt возвращает код символа в Unicode, а не в Windows-1251, значит нужно преобразовывать первый во второй. В-третьих, символа с кодом 0x98 в Windows-1251 нет, так что эта функция возвращает для него undefined, это нужно предусмотреть.

В-четвёртых, Internet Explorer и Safari не позволяют поменять заголовок документа через DOM, только через соответствующее свойство документа — но туда нельзя писать числовые ссылки HTML. Для этого случая можно переводить Unicode-коды в шестнадцатеричную систему счисления, записывать их в виде «%код» и пропускать через функцию unescape.

Итоговый код получается таким:

bindReady(
	function(){

		var Win1251 =
			{
				0x0:	0x0,
				0x1:	0x1,
				0x2:	0x2,
				0x3:	0x3,
				0x4:	0x4,
				0x5:	0x5,
				0x6:	0x6,
				0x7:	0x7,
				0x8:	0x8,
				0x9:	0x9,
				0xA:	0xA,
				0xB:	0xB,
				0xC:	0xC,
				0xD:	0xD,
				0xE:	0xE,
				0xF:	0xF,
				0x10:	0x10,
				0x11:	0x11,
				0x12:	0x12,
				0x13:	0x13,
				0x14:	0x14,
				0x15:	0x15,
				0x16:	0x16,
				0x17:	0x17,
				0x18:	0x18,
				0x19:	0x19,
				0x1A:	0x1A,
				0x1B:	0x1B,
				0x1C:	0x1C,
				0x1D:	0x1D,
				0x1E:	0x1E,
				0x1F:	0x1F,
				0x20:	0x20,
				0x21:	0x21,
				0x22:	0x22,
				0x23:	0x23,
				0x24:	0x24,
				0x25:	0x25,
				0x26:	0x26,
				0x27:	0x27,
				0x28:	0x28,
				0x29:	0x29,
				0x2A:	0x2A,
				0x2B:	0x2B,
				0x2C:	0x2C,
				0x2D:	0x2D,
				0x2E:	0x2E,
				0x2F:	0x2F,
				0x30:	0x30,
				0x31:	0x31,
				0x32:	0x32,
				0x33:	0x33,
				0x34:	0x34,
				0x35:	0x35,
				0x36:	0x36,
				0x37:	0x37,
				0x38:	0x38,
				0x39:	0x39,
				0x3A:	0x3A,
				0x3B:	0x3B,
				0x3C:	0x3C,
				0x3D:	0x3D,
				0x3E:	0x3E,
				0x3F:	0x3F,
				0x40:	0x40,
				0x41:	0x41,
				0x42:	0x42,
				0x43:	0x43,
				0x44:	0x44,
				0x45:	0x45,
				0x46:	0x46,
				0x47:	0x47,
				0x48:	0x48,
				0x49:	0x49,
				0x4A:	0x4A,
				0x4B:	0x4B,
				0x4C:	0x4C,
				0x4D:	0x4D,
				0x4E:	0x4E,
				0x4F:	0x4F,
				0x50:	0x50,
				0x51:	0x51,
				0x52:	0x52,
				0x53:	0x53,
				0x54:	0x54,
				0x55:	0x55,
				0x56:	0x56,
				0x57:	0x57,
				0x58:	0x58,
				0x59:	0x59,
				0x5A:	0x5A,
				0x5B:	0x5B,
				0x5C:	0x5C,
				0x5D:	0x5D,
				0x5E:	0x5E,
				0x5F:	0x5F,
				0x60:	0x60,
				0x61:	0x61,
				0x62:	0x62,
				0x63:	0x63,
				0x64:	0x64,
				0x65:	0x65,
				0x66:	0x66,
				0x67:	0x67,
				0x68:	0x68,
				0x69:	0x69,
				0x6A:	0x6A,
				0x6B:	0x6B,
				0x6C:	0x6C,
				0x6D:	0x6D,
				0x6E:	0x6E,
				0x6F:	0x6F,
				0x70:	0x70,
				0x71:	0x71,
				0x72:	0x72,
				0x73:	0x73,
				0x74:	0x74,
				0x75:	0x75,
				0x76:	0x76,
				0x77:	0x77,
				0x78:	0x78,
				0x79:	0x79,
				0x7A:	0x7A,
				0x7B:	0x7B,
				0x7C:	0x7C,
				0x7D:	0x7D,
				0x7E:	0x7E,
				0x7F:	0x7F,
				0x402:	0x80,
				0x403:	0x81,
				0x201A:	0x82,
				0x453:	0x83,
				0x201E:	0x84,
				0x2026:	0x85,
				0x2020:	0x86,
				0x2021:	0x87,
				0x20AC:	0x88,
				0x2030:	0x89,
				0x409:	0x8A,
				0x2039:	0x8B,
				0x40A:	0x8C,
				0x40C:	0x8D,
				0x40B:	0x8E,
				0x40F:	0x8F,
				0x452:	0x90,
				0x2018:	0x91,
				0x2019:	0x92,
				0x201C:	0x93,
				0x201D:	0x94,
				0x2022:	0x95,
				0x2013:	0x96,
				0x2014:	0x97,
				0x2122:	0x99,
				0x459:	0x9A,
				0x203A:	0x9B,
				0x45A:	0x9C,
				0x45C:	0x9D,
				0x45B:	0x9E,
				0x45F:	0x9F,
				0xA0:	0xA0,
				0x40E:	0xA1,
				0x45E:	0xA2,
				0x408:	0xA3,
				0xA4:	0xA4,
				0x490:	0xA5,
				0xA6:	0xA6,
				0xA7:	0xA7,
				0x401:	0xA8,
				0xA9:	0xA9,
				0x404:	0xAA,
				0xAB:	0xAB,
				0xAC:	0xAC,
				0xAD:	0xAD,
				0xAE:	0xAE,
				0x407:	0xAF,
				0xB0:	0xB0,
				0xB1:	0xB1,
				0x406:	0xB2,
				0x456:	0xB3,
				0x491:	0xB4,
				0xB5:	0xB5,
				0xB6:	0xB6,
				0xB7:	0xB7,
				0x451:	0xB8,
				0x2116:	0xB9,
				0x454:	0xBA,
				0xBB:	0xBB,
				0x458:	0xBC,
				0x405:	0xBD,
				0x455:	0xBE,
				0x457:	0xBF,
				0x410:	0xC0,
				0x411:	0xC1,
				0x412:	0xC2,
				0x413:	0xC3,
				0x414:	0xC4,
				0x415:	0xC5,
				0x416:	0xC6,
				0x417:	0xC7,
				0x418:	0xC8,
				0x419:	0xC9,
				0x41A:	0xCA,
				0x41B:	0xCB,
				0x41C:	0xCC,
				0x41D:	0xCD,
				0x41E:	0xCE,
				0x41F:	0xCF,
				0x420:	0xD0,
				0x421:	0xD1,
				0x422:	0xD2,
				0x423:	0xD3,
				0x424:	0xD4,
				0x425:	0xD5,
				0x426:	0xD6,
				0x427:	0xD7,
				0x428:	0xD8,
				0x429:	0xD9,
				0x42A:	0xDA,
				0x42B:	0xDB,
				0x42C:	0xDC,
				0x42D:	0xDD,
				0x42E:	0xDE,
				0x42F:	0xDF,
				0x430:	0xE0,
				0x431:	0xE1,
				0x432:	0xE2,
				0x433:	0xE3,
				0x434:	0xE4,
				0x435:	0xE5,
				0x436:	0xE6,
				0x437:	0xE7,
				0x438:	0xE8,
				0x439:	0xE9,
				0x43A:	0xEA,
				0x43B:	0xEB,
				0x43C:	0xEC,
				0x43D:	0xED,
				0x43E:	0xEE,
				0x43F:	0xEF,
				0x440:	0xF0,
				0x441:	0xF1,
				0x442:	0xF2,
				0x443:	0xF3,
				0x444:	0xF4,
				0x445:	0xF5,
				0x446:	0xF6,
				0x447:	0xF7,
				0x448:	0xF8,
				0x449:	0xF9,
				0x44A:	0xFA,
				0x44B:	0xFB,
				0x44C:	0xFC,
				0x44D:	0xFD,
				0x44E:	0xFE,
				0x44F:	0xFF
			}

		String.prototype.Win1251_charCodeAt=function(char_num){
			var char_code=this.charCodeAt(char_num);
			return (char_code===undefined)?0x98:Win1251[char_code];
		}

		function utf8_decode(text){
			text=text.replace(/ /g,"\u00A0");
			var char_code, char_code2, char_code3, char_code4;
			var result_str='';
			for(var char_num=0; char_num<text.length; char_num++)
				if((char_code=text.Win1251_charCodeAt(char_num))<0x80 || char_code===text.charCodeAt(char_num))
					result_str+=text.charAt(char_num);//0zzzzzzz - 00000000 00000000 00000000 0zzzzzzz
				else if(char_code>=0xC0)
					if(char_code<0xE0){
						if(
							(char_code2=text.Win1251_charCodeAt(++char_num))>=0x80 &&
							char_code2<0xC0
						)//110yyyyy 10zzzzzz - 00000000 00000000 00000yyy yyzzzzzz
							result_str+="&#"+((char_code-0xC0)*0x40+(char_code2-0x80))+";";
					}
					else if(char_code<0xF0){
						if(
							(char_code2=text.Win1251_charCodeAt(++char_num))>=0x80 &&
							char_code2<0xC0 &&
							(char_code3=text.Win1251_charCodeAt(++char_num))>=0x80 &&
							char_code3<0xC0
						)//1110xxxx 10yyyyyy 10zzzzzz - 00000000 00000000 xxxxyyyy yyzzzzzz
							result_str+="&#"+((char_code-0xE0)*0x1000+(char_code2-0x80)*0x40+(char_code3-0x80))+";";
					}
					else if(
						char_code<0xF8 &&
						(char_code2=text.Win1251_charCodeAt(++char_num))>=0x80 &&
						char_code2<0xC0 &&
						(char_code3=text.Win1251_charCodeAt(++char_num))>=0x80 &&
						char_code3<0xC0 &&
						(char_code4=text.Win1251_charCodeAt(++char_num))>=0x80 && char_code4<0xC0
					)//11110www 10xxxxxx 10yyyyyy 10zzzzzz - 00000000 000wwwxx xxxxyyyy yyzzzzzz
						result_str+="&#"+((char_code-0xF0)*0x40000+(char_code2-0x80)*0x1000+(char_code3-0x80)*0x40+(char_code4-0x80))+";";
		    return result_str;
		}

		function unescapeTitle(title){
			return unescape(
				utf8_decode(document.title).replace(
					/&#([0-9]+);/g,
					function(expression, value){
						if(isNaN(value=parseInt(value, 10)))
							return NaN;
						var i=0, retval="", radix=16;
						while(i++<4 || value>0){
							retval=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"][value%radix]+retval;	
							value=Math.floor(value/radix);
						}
						return "%u"+retval;
					}
				)
			);
		}

		document.body.innerHTML=utf8_decode(document.body.innerHTML);
		if("vendor" in navigator && navigator.vendor.indexOf("Apple")>-1){
			document.title=unescapeTitle(document.title);
			return;
		}
		/*@cc_on
		document.title=unescapeTitle(document.title);
		return;
		@*/
		document.getElementsByTagName("title")[0].innerHTML=utf8_decode(document.title);
	}
);

Проверено в Firefox 3 и 4; Opera 9, 10 и 11; Internet Explorer 5.5, 6, 7, 8; Google Chrome и Safari последних релизных версий.

Конечно, это кунштюк; я на нём разобрался, что такое UTF-8. По-хорошему, сервер не должен вредить своими HTTP-заголовками. Но тут встаёт философский вопрос: а кому лучше знать кодировку документа — серверу (автору .htaccess) или самому HTML-документу (его автору)? Возможно, у браузеров есть веская причина верить серверу, а не meta-тегу в документе.

iconv-lite: Pure JS character encoding conversion

  • No need for native code compilation. Quick to install, works on Windows, Web, and in sandboxed environments.
  • Used in popular projects like Express.js (body_parser),
    Grunt, Nodemailer, Yeoman and others.
  • Faster than node-iconv (see below for performance comparison).
  • Intuitive encode/decode API, including Streaming support.
  • In-browser usage via browserify or webpack (~180kb gzip compressed with Buffer shim included).
  • Typescript type definition file included.
  • React Native is supported (need to install stream module to enable Streaming API).
  • License: MIT.

NPM Stats
Build Status
npm
npm downloads
npm bundle size

Usage

Basic API

var iconv = require('iconv-lite');

// Convert from an encoded buffer to a js string.
str = iconv.decode(Buffer.from([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251');

// Convert from a js string to an encoded buffer.
buf = iconv.encode("Sample input string", 'win1251');

// Check if encoding is supported
iconv.encodingExists("us-ascii")

Streaming API

// Decode stream (from binary data stream to js strings)
http.createServer(function(req, res) {
    var converterStream = iconv.decodeStream('win1251');
    req.pipe(converterStream);

    converterStream.on('data', function(str) {
        console.log(str); // Do something with decoded strings, chunk-by-chunk.
    });
});

// Convert encoding streaming example
fs.createReadStream('file-in-win1251.txt')
    .pipe(iconv.decodeStream('win1251'))
    .pipe(iconv.encodeStream('ucs2'))
    .pipe(fs.createWriteStream('file-in-ucs2.txt'));

// Sugar: all encode/decode streams have .collect(cb) method to accumulate data.
http.createServer(function(req, res) {
    req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) {
        assert(typeof body == 'string');
        console.log(body); // full request body string
    });
});

Supported encodings

  • All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
  • Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap, utf32, utf32-le, and utf32-be.
  • All widespread singlebyte encodings: Windows 125x family, ISO-8859 family,
    IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library.
    Aliases like ‘latin1’, ‘us-ascii’ also supported.
  • All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2312, GBK, GB18030, Big5, Shift_JIS, EUC-JP.

See all supported encodings on wiki.

Most singlebyte encodings are generated automatically from node-iconv. Thank you Ben Noordhuis and libiconv authors!

Multibyte encodings are generated from Unicode.org mappings and WHATWG Encoding Standard mappings. Thank you, respective authors!

Encoding/decoding speed

Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0).
Note: your results may vary, so please always check on your hardware.

operation             iconv@2.1.4   iconv-lite@0.4.7
----------------------------------------------------------
encode('win1251')     ~96 Mb/s      ~320 Mb/s
decode('win1251')     ~95 Mb/s      ~246 Mb/s

BOM handling

  • Decoding: BOM is stripped by default, unless overridden by passing stripBOM: false in options
    (f.ex. iconv.decode(buf, enc, {stripBOM: false})).
    A callback might also be given as a stripBOM parameter — it’ll be called if BOM character was actually found.
  • If you want to detect UTF-8 BOM when decoding other encodings, use node-autodetect-decoder-stream module.
  • Encoding: No BOM added, unless overridden by addBOM: true option.

UTF-16 Encodings

This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be
smart about endianness in the following ways:

  • Decoding: uses BOM and ‘spaces heuristic’ to determine input endianness. Default is UTF-16LE, but can be
    overridden with defaultEncoding: 'utf-16be' option. Strips BOM unless stripBOM: false.
  • Encoding: uses UTF-16LE and writes BOM by default. Use addBOM: false to override.

UTF-32 Encodings

This library supports UTF-32LE, UTF-32BE and UTF-32 encodings. Like the UTF-16 encoding above, UTF-32 defaults to UTF-32LE, but uses BOM and ‘spaces heuristics’ to determine input endianness.

  • The default of UTF-32LE can be overridden with the defaultEncoding: 'utf-32be' option. Strips BOM unless stripBOM: false.
  • Encoding: uses UTF-32LE and writes BOM by default. Use addBOM: false to override. (defaultEncoding: 'utf-32be' can also be used here to change encoding.)

Other notes

When decoding, be sure to supply a Buffer to decode() method, otherwise bad things usually happen.
Untranslatable characters are set to � or ?. No transliteration is currently supported.
Node versions 0.10.31 and 0.11.13 are buggy, don’t use them (see #65, #77).

Testing

$ git clone git@github.com:ashtuchkin/iconv-lite.git
$ cd iconv-lite
$ npm install
$ npm test
    
$ # To view performance:
$ node test/performance.js

$ # To view test coverage:
$ npm run coverage
$ open coverage/lcov-report/index.html

Пример того, как преобразовать в Node.js кодировку windows-1251 в UTF-8.

Подобное преобразование актуально, если вы парсите сайты, у которых кодировка windows-1251 и с текстом на русском языке.

const iconv = require(‘iconv-lite’)

var body = iconv.encode (iconv.decode (new Buffer (body, ‘binary’), ‘win1251’), ‘utf8’)

Пример загрузки страницы и преобразования кодировки

request({

         uri: url,

         method: ‘GET’,

         encoding: ‘binary’

        }, function (error, response, body) {

if(!error && response.statusCode===200){

var body = iconv.encode (iconv.decode (new Buffer (body, ‘binary’), ‘win1251’), ‘utf8’)

var $ = cheerio.load(body) //Это строчка, если будете работать с Jquery

}

}

)

  • Перекодировать из utf 8 в windows 1251 онлайн
  • Переключение языка на клавиатуре windows 10 сочетание клавиш на клавиатуре
  • Переключить режим ide на ahci windows 7
  • Переключение языка по caps lock windows 10
  • Переключение языка на экране блокировки windows 10