Node.js: how many bits in a string?

Possible duplicate:
How many bytes in a JavaScript string? String length in bytes in JavaScript

How to calculate the number of bits in a string? Actually I need how many octets (8-bit bytes) in a JavaScript string (V8)? If it is not possible to find out, is there any other character data structure that may be useful here instead of String?

UPDATE: for UTF-8 encoding

+4
source share
1 answer

Assuming you use only BMP characters :

/* Compute length of UTF-8 serialization of string s. */ function utf8Length(s) { var l = 0; for (var i = 0; i < s.length; i++) { var c = s.charCodeAt(i); if (c <= 0x007f) l += 1; else if (c <= 0x07ff) l += 2; else if (c >= 0xd800 && c <= 0xdfff) l += 2; // surrogates else l += 3; } return l; } 

If you exit BMP (i.e., use characters above 0xffff), everything becomes more complicated, as they will be visible in JavaScript as surrogate pairs that you will need to identify ...

Update : I updated the code to work with all Unicode, not just BMP. However, this code currently relies on a strong assumption: that this string is true to UTF-16. It works by counting two bytes for each surrogate found in the string. The truth is that a surrogate pair is encoded as 4 bytes in UTF-8, and a surrogate should never be found outside the pair.

+3
source

Source: https://habr.com/ru/post/1387356/


All Articles