I have a similar problem with the one mentioned here , but none of the suggested methods work for me.
I have an average utf-8 .csv file size with a lot of characters other than ascii. I split the file into a specific value from one of the columns, and then I would like to save each of the received data frames as a .xlsx file with the characters saved.
This does not work as I get the error message:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 7: ordinal not in range(128)
Here is what I tried:
- The use of the
xlsxwriter mechanism xlsxwriter explicit. It doesn't seem to change anything. Defining a function (below) for changing the encoding and throwing out bad characters. It also does not change anything.
def changeencode(data): cols = data.columns for col in cols: if data[col].dtype == 'O': data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore') return data
Manually changing all offensive characters to others. There is still no effect (a quoted error was received after this change).
Encoding the file as utf-16 (which, I believe, is the correct encoding, since I want to be able to manipulate the file from excel afterwards) also does not help.
I believe the problem is in the file itself (due to 2 and 3), but I have no idea how to get around it. I would appreciate any help. The beginning of the file is inserted below.
"Submitted","your-name","youremail","phone","miasto","cityCF","innemiasto","languagesCF","morelanguages","wiek","partnerCF","messageCF","acceptance-795","Submitted Login","Submitted From","2015-12-25 14:07:58 +00:00","Zózia kryś"," test@tes.pl ","4444444","Wrocław","","testujemy polskie znaki","Polski","testujemy polskie znaki","44","test","test","1","Justyna","99.111.155.132",
EDIT
Some code (one of the versions without a dividing part):
import pandas as pd import string import xlsxwriter df = pd.read_csv('path-to-file.csv') with pd.ExcelWriter ('test.xlsx') as writer: df.to_excel(writer, sheet_name = 'sheet1',engine='xlsxwriter')