Node.js: how many bits in a string?

Question

Node.js: how many bits in a string?

Possible duplicate:
How many bytes in a JavaScript string? String length in bytes in JavaScript

How to calculate the number of bits in a string? Actually I need how many octets (8-bit bytes) in a JavaScript string (V8)? If it is not possible to find out, is there any other character data structure that may be useful here instead of String?

UPDATE: for UTF-8 encoding

+4

javascript string node.js

user1109648 Dec 21 '11 at 10:35

source share

1 answer

Edgar bonet · Answer 1 · 2011-12-21T13:56:50+0000

~~Assuming you use only BMP characters~~ :

/* Compute length of UTF-8 serialization of string s. */ function utf8Length(s) { var l = 0; for (var i = 0; i < s.length; i++) { var c = s.charCodeAt(i); if (c <= 0x007f) l += 1; else if (c <= 0x07ff) l += 2; else if (c >= 0xd800 && c <= 0xdfff) l += 2; // surrogates else l += 3; } return l; }

If you exit BMP (i.e., use characters above 0xffff), everything becomes more complicated, as they will be visible in JavaScript as surrogate pairs that you will need to identify ...

Update : I updated the code to work with all Unicode, not just BMP. However, this code currently relies on a strong assumption: that this string is true to UTF-16. It works by counting two bytes for each surrogate found in the string. The truth is that a surrogate pair is encoded as 4 bytes in UTF-8, and a surrogate should never be found outside the pair.

Node.js: how many bits in a string?

More articles: