Read a file in python with a 0xc0 bug that causes utf-8 and ascii errors

Trying to read tab delimited file in pandas dataframe:

>>> df = pd.read_table(fn , na_filter=False, error_bad_lines=False)

It looks like this:

b'Skipping line 58: expected 11 fields, saw 12\n'
Traceback (most recent call last):
...(many lines)...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 115: invalid start byte

Byte 0xc0 seems to be painful for utf-8 and ascii coding.

>>> df = pd.read_table(fn , na_filter=False, error_bad_lines=False, encoding='ascii')
...(many lines)...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 115: ordinal not in range(128)

csv.
OpenOffice Calc, , .. , 0xc0 . - .., , , , . , , . python. error_bad_lines=False pandas , . , , unicode . . utf-16 utf-32 .., .

python (pandas Dataframe ) , - 0xc0?

+1
1

, .

, ( , ) 0xc0:

encoding="ISO-8859-1"  

. , . , , unicode, , python , utf-8 ascii .

ISO-8859-1: UTF-8 ISO-8859-1?

, :

>>> df = pd.read_table(fn , na_filter=False, error_bad_lines=False, encoding='ISO-8859-1')

dataframe , , , OpenOffice Calc. , 0xc0, , , .

0

Source: https://habr.com/ru/post/1541372/


All Articles