Difference between readAsBinaryString and readAsText using FileReader

So, as an example, when I read the Ο€ ( \u03C0 ) \u03C0 from File using the FileReader API, I return the pi character to me when I read it using FileReader.readAsText(blob) , which is expected. But when I use FileReader.readAsBinaryString(blob) , instead, the result is \xcf\x80 , which does not seem to have a visible correlation with the pi character. What's happening? (This is probably due to how UTF-8/16 is encoded ...)

+6
source share
2 answers

Well, if that's all you need ... :)

CF80 is the UTF-8 encoding for Ο€.

+2
source

FileReader.readAsText takes into account the encoding of the file. In particular, since you have a file encoded in UTF-8, there may be several bytes per character. Reading it as text, UTF-8 reads as is, and you get your line.

FileReader.readAsBinaryString , on the other hand, does exactly what it says. It reads the byte file by byte. It does not recognize multibyte characters, which, in particular, is good news for binary files (basically nothing but a text file). Since & pi; is a double-byte character, you get two separate bytes that make it up on your string.

This difference can be seen in many places. In particular, when the encoding is lost and you see characters like & # xe9; displayed as & # xc3; & # xa9 ;.

+12
source

Source: https://habr.com/ru/post/908808/


All Articles