In node.js, how to check UTF8 data in a buffer?

Question

In node.js, how to check UTF8 data in a buffer?

I need to check that the buffer contains valid UTF-8 data.

In Python, I can do this simply by trying to decode bytes and check for exceptions. In the example below, I am trying to decode the 1st byte of the encoded "¢". The exception tells me that I'm skipping bytes.

Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> s = '¢' >>> s_bytes[:1].decode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 0: unexpected end of data

This approach does not work in node.js because decoding is much more forgiving.

 > s = '¢' '¢' > s_buffer = Buffer(s) <Buffer c2 a2> > s_buffer.toString('utf8', 0, 1) '?' >

I checked the Buffer API page , but I cannot find any method for checking the buffer against the encoding.

+4

node.js utf-8

hwiechers Sep 2 '13 at 0:06

source share

No one has answered this question yet.

See similar questions:

eleven

How to write utf-8 decoding errors in node.js?

or similar:

2237

How to pass command line arguments to Node.js?

2201

How to decide when to use Node.js?

1648

How to exit Node.js

1517

How to debug Node.js applications?

1500

Writing Files to Node.js

1388

What is the purpose of Node.js module.exports and how do you use it?

1264

How to get started with Node.js

1116

Reading environment variables in Node.js

1045

Check synchronously if file / directory exists in Node.js

0

unicode decoding error for weasyprint

In node.js, how to check UTF8 data in a buffer?

More articles: