'utf-8' codec cannot decode byte 0x89

Question

'utf-8' codec cannot decode byte 0x89

I want to read the csv file and process some columns, but I keep getting problems. Stuck with the following error:

Traceback (most recent call last): File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module> for row in reader: File "C:\Python34\lib\codecs.py", line 313, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte >>>

My code

 import csv with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f: reader = csv.reader(f,delimiter=';',quotechar='|') #print(sum(1 for row in reader)) for row in reader: print(row) if row: value = row[6] value = value.replace('(', '') value = value.replace(')', '') value = value.replace(' ', '') value = value.replace('.', '') value = value.replace('0032', '0') if len(value) > 0: print(value + ' Length: ' + str(len(value)))

I start with Python, try to search, but it's hard to find the right solution.

Can anyone help me out?

+6

python csv

Sven Nov 30 '14 at 23:00

source share

1 answer

jar · Answer 1 · 2014-12-02T06:01:04+0000

This is the most important key:

invalid start byte

\x89 not, as indicated in the comments, invalid UTF-8 bytes. This is a fully valid continuation byte. The value, if it matches the correct byte value, correctly encodes UTF-8:

http://hexutf8.com/?q=0xc90x89

Thus, either you (1) do not have UTF-8 data as you expect, or (2) you have incorrect UTF-8 data. The Python codec just lets you know that it met \x89 in the wrong order in the sequence.

(More on continuing bytes here: http://en.wikipedia.org/wiki/UTF-8#Codepage_layout )

'utf-8' codec cannot decode byte 0x89

More articles: