Assuming you use only BMP characters :
function utf8Length(s) { var l = 0; for (var i = 0; i < s.length; i++) { var c = s.charCodeAt(i); if (c <= 0x007f) l += 1; else if (c <= 0x07ff) l += 2; else if (c >= 0xd800 && c <= 0xdfff) l += 2;
If you exit BMP (i.e., use characters above 0xffff), everything becomes more complicated, as they will be visible in JavaScript as surrogate pairs that you will need to identify ...
Update : I updated the code to work with all Unicode, not just BMP. However, this code currently relies on a strong assumption: that this string is true to UTF-16. It works by counting two bytes for each surrogate found in the string. The truth is that a surrogate pair is encoded as 4 bytes in UTF-8, and a surrogate should never be found outside the pair.
source share