How to convert character encoding from CP932 to UTF-8 in javascript nodejs using nodejs-iconv module (or another solution)

I am trying to convert a string from CP932 (aka Windows-31J) to utf8 in javascript. I basically crawl a site that ignores the utf-8 request in the request header and returns cp932 encoded text (even if the html meta tag indicates that the page is shift_jis).

Anyway, I have the whole page stored in a string variable called "html". From there I try to convert it to utf8 using this code:

var Iconv = require('iconv').Iconv; var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE'); var myBuffer = new Buffer(html.length * 3); myBuffer.write(html, 0, 'utf8') var utf8html = (conv.convert(myBuffer)).toString('utf8'); 

The result is not what it should have been. For example, the line: "投稿 者 さ ん の 稚 内 全日空 ホ テ ル の ク チ コ ミ (感想 · 情報)" goes like "ソ ス ソ ス ソ ス electronic ソ ス メ ゑ ソ ス ソ ス ソ ス ス ス ソス ソ ス t ソ ス ソ ス ソ ス S ソ ス ソ ス ソ ス ソ ス ソ ス g ソ ス e ソ ス ソ ス ソ ス フ ク ソ ス `ソ ス R ソ ス ~ (ソ ス ソ ス ソ ス zソ ス E ソ ス ソ ス ソ ス ソ ス) "

If I delete // TRANSLIT // IGNORE (which should make it return similar characters for missing characters, and if it does not skip characters other than transcoding), I get this error: Error: EILSEQ, Invalid character sequence.

I am open to using any solution that can be implemented in nodejs, but my search results did not give many parameters outside the nodejs-iconv module.

nodejs-iconv ref: https://github.com/bnoordhuis/node-iconv

Thanks!

Edit 06.24.2011: I went ahead and implemented the solution in Java. However, I will still be interested in the javascript solution for this problem, if someone can solve it.

+6
source share
3 answers

Today I have the same problem :)
It depends on libiconv. You need libiconv-1.13-ja-1.patch.
Please check the following.

or you can avoid problems with iconv-jp try

  npm install iconv-jp 
+5
source

I had the same problem, but with CP1250. I searched for the problem everywhere and everything was fine except for calling the request - I had to add encoding: 'binary' .

 request = require('request') Iconv = require('iconv').Iconv request({uri: url, encoding: 'binary'}, function(err, response, body) { body = new Buffer(body, 'binary') iconv = new Iconv('CP1250', 'UTF8') body = iconv.convert(body).toString() // ... }) 
+5
source

https://github.com/bnoordhuis/node-iconv/issues/19

I tried / Users / Me / node_modules / iconv / test.js node test.js. It returns an error.

On Mac OS X Lion, this problem depends on gcc.

0
source

Source: https://habr.com/ru/post/890924/


All Articles