I am trying to clear some Japanese sites for a personal project. Sites with text in UTF-8 work fine, as expected, but I can not get any text from sites that define other international encodings, in particular EUC-JP. Node also seems to interpret the text and make modifications, rather than passing it to raw - I tried to set an answer that would be interpreted as ascii and binary, and then install my terminal application in EUC-JP, but after that a console.log(), do not lead to the actual text.
I had a check through the Node documentation, and it looks like it only supports two main text encodings (except for binary and base64.)
I use the built-in http client and set the encoding using a method response.setEncoding, for example.response.setEncoding('utf8');
How do other people work with international text in Node (especially in situations where the source data is not in UTF-8?) Are binary buffers the only way?
While I was doing a little research, I am not very good at character encoding, so simple answers will be appreciated. Thank!
source
share