Python - convert wide char strings from binary to Python Unicode strings

It has been a long day and I am a little puzzled.

I am reading a binary containing many char strings and I want to unload them as Python Unicode strings. (To unzip non-string data, I use the struct module, but I do not do the same with strings.)

For example, after reading the word "Series":

myfile = open("test.lei", "rb") myfile.seek(44) data = myfile.read(12) # data is now 'S\x00e\x00r\x00i\x00e\x00s\x00' 

How can I encode raw wide-char data as a Python string?

Edit: I am using Python 2.6

+4
source share
4 answers
 >>> data = 'S\x00e\x00r\x00i\x00e\x00s\x00' >>> data.decode('utf-16') u'Series' 
+6
source

If a known string does not have any characters outside of FF, another option that generates a string rather than a unicode object is by eliminating null bytes:

 >>> 'S\x00e\x00r\x00i\x00e\x00s\x00'[::2] 'Series' 
+2
source

I also recommend using rstrip with '\x00' after decoding - delete all '\x00' trailing characters, unless, of course, they are needed.

 >>> data = 'S\x00o\x00m\x00e\x00\x20\x00D\x00a\x00t\x00a\x00\x00\x00\x00\x00' >>> print '"%s"' % data.decode('utf-16').rstrip('\x00') >>> "Some Data" 

Without rstrip('\x00') result will be with trailing spaces:

 >>> "Some Data " 
+2
source

Hm, why do you say that "open" is preferable to "file"? I see in the link (python 2.5):

3.9 File Objects File objects are implemented using the C stdio package and can be created with the built-in constructor file (), described in section 2.1, `` Built-in Functions. '' 3.6 ----- Footnote (3.6) file () is new in Python 2.2. The older built-in open () is an alias for the file ().

0
source

Source: https://habr.com/ru/post/1308488/


All Articles