Pandas read _excel: codec 'utf-8' cannot decode byte 0xa8 at position 14: invalid start byte

An attempt to read the MS Excel file, version 2016. The file contains several lists with data. The file is downloaded from the database and can be opened correctly in MS Office. In the example below, I changed the file name.

EDIT file: contains Russian and English words. Most likely, Latin-1 encoding is used, but it encoding='latin-1'does not help

import pandas as pd
with open('1.xlsx', 'r', encoding='utf8') as f:
        data = pd.read_excel(f)

Result:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte

Without encoding ='utf8'

'charmap' codec can't decode byte 0x9d in position 622: character maps to <undefined>

PS The task is to process 52 files, combine the data on each sheet with the corresponding sheets in 52 files. Therefore, please do not seek advice.

+4
source share
2 answers

Most likely, the problem is in Russian symbolism.

Charmap - , .

, utf-8 latin-1 ,

pd.read_excel(f)

pd.read_table(f)

f.readline()

, , exeception /.

+2

Panda , :

df=pd.read_excel('your_file.xlsx',encoding='utf-8')

- , :

df=pd.read_excel('your_file.xlsx',encoding='sys.getfilesystemencoding()')
+3

Source: https://habr.com/ru/post/1693268/


All Articles