How to get ASCII code from a string in JavaScript?

(Similar questions were asked in StackOverflow, but not quite. The closest one is probably โ€œ javascript, how to convert a unicode string to ascii โ€, where itโ€™s already a remark โ€œit should be dup [licate].โ€ I read a few similar messages, but they they donโ€™t answer my specific question. I looked at very good W3Schools and also Google Googled, but still couldnโ€™t find the answer. Therefore, any hints here are very appreciated.)


I have an array of bytes passed to a JavaScript fragment. In JavaScript, data comes in a string. I do not know the transfer mechanism, as it comes from a third-party application. I donโ€™t even know if the string is โ€œwideโ€ or โ€œnarrowโ€.

In my JavaScript, I have code like b = str.charCodeAt(pos);.

My problem is that a byte value like 0x86 = 134 goes through the character 0x2020 = 8224. It looks like my source byte is interpreted as a Latin character (1) (possibly) and then translated into an equivalent code Unicode point. (The problem may or may not be a JavaScript error.) Similar problems occur with other values, although the ranges 0x00..0x7F and 0xA0..0xFF seem accurate, but most values โ€‹โ€‹from 0x80..0x9F are affected, in each case the value seems to be Unicode for source Latin-1.

Another observation is that the length of the string is what I would expect for a narrow string if the length were measured in bytes. (On the other hand, if length returns a value in abstract characters, this tells me nothing.)

So, in JavaScript, is there a way to get the raw bytes in a string, or get the Latin-1 or ASCII character code directly, or convert the character encoding, or determine the default encoding?

I could write my own mapping, but I would prefer. I expect that this is what I will eventually do, but it has a sense of breaking into shreds.

I am also exploring if there is anything that I can configure in the calling application (since it could pass data as a wide string, although I doubt it).

However, I would be wondering if there is a simple JavaScript solution or understand why this is not happening.

(If the input was character data, using Unicode with that would automatically be great, but it's not just a stream of binary data.)

Thank.

+3
source share
2 answers

There is no such thing as raw bytes in String. The EcmaScript specification defines a string as a sequence of code units UTF-16. This is the most subtle representation that any interpreter has ever encountered.

There are no encoding libraries in the browser. You need to roll yourself if you are trying to represent an array of bytes as a string and want to rewrite it.

ASCII, charCodeAt.

"\n".charCodeAt(0) === 10
+6

Javascript (Ecmascript): http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf. :

8.4 String 16- ( "" ). String ECMAScript , String (. 6). . . ( ) 0, ( ) 1 .. - ( 16- ) . , , .

, UTF-16. , , , UTF-16. ( ) 16- ; , , , .

. . , , (, , , ..) C Unicode . ( ). ECMAScript C, ( ), Unicode escape-.

charCodeAt( p ) UTF-16 (16- ) p . UTF-16 Unicode ( U+0000 โ€“ U+D7FF U+E000 โ€“ U+FFFF, Latin-1 , .

, , 3- โ€” UTF-16 , .

, ASCII, UTF-8 ( ). UTF-8 0x7F 2-, 3- 4- "".

+3

Source: https://habr.com/ru/post/1788621/


All Articles