Python - reading a UTF-8 encoded string byte

Question

Python - reading a UTF-8 encoded string byte

I have a device that returns a UTF-8 encoded string. I can only read a byte number from it, and the reading ends with a byte with a value of 0x00.

I am making a Python 2.7 function to access my device and return a string.

In the previous project, when the device just returned ASCII, I used this in a loop:

x = read_next_byte()
if x == 0:
    break
my_string += chr(x)

Where x is the last byte value read from the device.

Now the device can return the UTF-8 encoded string, but I'm not sure how to convert the bytes that I will return to the UFF-8 / unicode encoding.

chr(x)clearly causes an error when x> 127, so I thought that using unichr(x)might work, but that assumes the passed value is the full character value in Unicode, but I only have part 0-255.

So, how can I convert the bytes that I return from the device to a string that can be used in Python and still handle the full UTF-8 string?

Similarly, if I was given a UTF-8 string in Python, how would I break this into separate bytes to send to my device and still save UTF-8?

+4

python python-2.7 encoding unicode utf-8

Will Sep 26 '16 at 19:57

source share

1 answer

ShadowRanger · Accepted Answer · 2016-09-26T19:59:45+0000

, , , UTF-8 ( ):

mybytes = bytearray()
while True:
    x = read_next_byte()
    if x == 0:
        break
    mybytes.append(x)
my_string = mybytes.decode('utf-8')

. , , arg iter , , C- Python, :

# If this were Python 3 code, you'd use the bytes constructor instead of bytearray
my_string = bytearray(iter(read_next_byte, 0)).decode('utf-8')

Python - reading a UTF-8 encoded string byte

More articles: