Python - reading a UTF-8 encoded string byte

I have a device that returns a UTF-8 encoded string. I can only read a byte number from it, and the reading ends with a byte with a value of 0x00.

I am making a Python 2.7 function to access my device and return a string.

In the previous project, when the device just returned ASCII, I used this in a loop:

x = read_next_byte()
if x == 0:
    break
my_string += chr(x)

Where x is the last byte value read from the device.

Now the device can return the UTF-8 encoded string, but I'm not sure how to convert the bytes that I will return to the UFF-8 / unicode encoding.

chr(x)clearly causes an error when x> 127, so I thought that using unichr(x)might work, but that assumes the passed value is the full character value in Unicode, but I only have part 0-255.

So, how can I convert the bytes that I return from the device to a string that can be used in Python and still handle the full UTF-8 string?

Similarly, if I was given a UTF-8 string in Python, how would I break this into separate bytes to send to my device and still save UTF-8?

+4
source share
1 answer

, , , UTF-8 ( ):

mybytes = bytearray()
while True:
    x = read_next_byte()
    if x == 0:
        break
    mybytes.append(x)
my_string = mybytes.decode('utf-8')

. , , arg iter , , C- Python, :

# If this were Python 3 code, you'd use the bytes constructor instead of bytearray
my_string = bytearray(iter(read_next_byte, 0)).decode('utf-8')
+3

Source: https://habr.com/ru/post/1655883/


All Articles