Node.js encodes ISO8859-1 by UTF-8

I have an application that allows users to store rows in a database, and these rows can contain emojis. The problem I have is emoji, such as 😊, will be stored in MySQL as😊

When I extract this string using the PHP-MySQL client and display it in a web browser, it looks great because the parameter is Content-Typeset to UTF-8. When I try to read a string in node.js, I return what, in my opinion, is an ISO8859-1 encoding literal 😊. The encoding on the table is set to latin1and where I get ISO8859-1 from.

What is the correct way to encode a string in node.js so that I can see emoji and not the encoding set by MySQL when I am a console.logstring?

+6
source share
5 answers

😊 Mojibake for 😊. Interpreting the first as latin1, you get hex F09F988A, which is the six UTF-8for this Emoji.

(Note: UTF-8outside MySQL is equivalent utf8mb4inside MySQL.)

In MySQL, you should have a column / table specified with CHARACTER SET utf8mb4. You must also indicate that the stored / retrieved data is encoded utf8mb4. Note: utf8not enough.

a SELECT HEX(col) FROM ..., , Emoji. , latin1, , utf8mb4. CHARACTER SET latin1, UTF-8; . , VARCHAR(111) CHARACTER SET latin1 NOT NULL, ALTER:

ALTER TABLE tbl MODIFY COLUMN col VARBINARY(111) NOT NULL;
ALTER TABLE tbl MODIFY COLUMN col VARCHAR(111) CHARACTER SET utf8mb4 NOT NULL;

.

, : node.js:

var connection = mysql.createConnection({ ... , charset : 'utf8mb4'});
+12

. . HTML- UTF-8, UTF-8.

, latin1, . UTF-8. , , , . , UTF-8, .

UTF-8, .

β†’ GET HTML β†’ POST β†’ β†’ SQL β†’ β†’ β†’

+3

:

    const isoToUtfTable = {
      'Γ°': 0xf0,
      'ΕΈ': 0x9f,
      '˜': 0x98,
      'Ε ': 0x8a
    };
    
    function convertISO8859ToUtf8(s) {
      const buf = new Uint8Array([...s].map(c => isoToUtfTable[c]));
      return String.fromCharCode(...buf)
    }
    
    function decode_utf8(s) {
      return decodeURIComponent(escape(s));
    }
    
    console.log(decode_utf8(convertISO8859ToUtf8('😊')))
Hide result

isoToUtfTable ( , . https://en.wikipedia.org/wiki/ISO/IEC_8859-1).

+2

iconv ( ISO-8859-1 UTF-8)

gist

var iconv = require('iconv');

function toUTF8(body) {
  // convert from iso-8859-1 to utf-8
  var ic = new iconv.Iconv('iso-8859-1', 'utf-8');
  var buf = ic.convert(body);
  return buf.toString('utf-8');
}

here, if you pass something to ISO-8859-1, it will return UTF-8 to it.

eg,

toUTF8("😊");

will return 😊

+2
source

Perhaps try looking at node-iconv .

const iconv = new Iconv('ISO-8859-2', 'UTF-8');
const buffer = iconv.convert(something);
console.log(buffer);
console.log(buffer.toString('UTF8'));
+1
source

Source: https://habr.com/ru/post/1016153/


All Articles