'utf-8' codec cannot decode byte 0x89

I want to read the csv file and process some columns, but I keep getting problems. Stuck with the following error:

Traceback (most recent call last): File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module> for row in reader: File "C:\Python34\lib\codecs.py", line 313, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte >>> 

My code

 import csv with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f: reader = csv.reader(f,delimiter=';',quotechar='|') #print(sum(1 for row in reader)) for row in reader: print(row) if row: value = row[6] value = value.replace('(', '') value = value.replace(')', '') value = value.replace(' ', '') value = value.replace('.', '') value = value.replace('0032', '0') if len(value) > 0: print(value + ' Length: ' + str(len(value))) 

I start with Python, try to search, but it's hard to find the right solution.

Can anyone help me out?

+6
source share
1 answer

This is the most important key:

invalid start byte

\x89 not, as indicated in the comments, invalid UTF-8 bytes. This is a fully valid continuation byte. The value, if it matches the correct byte value, correctly encodes UTF-8:

http://hexutf8.com/?q=0xc90x89

Thus, either you (1) do not have UTF-8 data as you expect, or (2) you have incorrect UTF-8 data. The Python codec just lets you know that it met \x89 in the wrong order in the sequence.

(More on continuing bytes here: http://en.wikipedia.org/wiki/UTF-8#Codepage_layout )

+3
source

Source: https://habr.com/ru/post/978909/


All Articles