Best way to convert unicode to csv in plain text?

I have a large csv file that contains Unicode characters that cause errors in the Python script I'm trying to run. My process of removing them so far has been rather tedious. I run my script, and as soon as it falls into the unicode character, I get an error message:

'ascii' codec can't encode character u'\xef' in position 197: ordinal not in range(128)

Then I google u '\ xef' and try to figure out what the character really is (Does anyone know a website with a list of these definitions?). I use this information to create a dictionary, and I have a second Python script that converts Unicode characters to plain text:

unicode_dict = {"\xb0":"deg", "\xa0":" ", "\xbd":"1/2", "\xbc":"1/4", "\xb2":"^2", "\xbe":"3/4"}

for f in glob.glob(r"C:\Folder1\*.csv"):
    in_csv = f
    out_csv = f.replace(".csv", "_2.csv")

    write_f=open(out_csv, "wb")
    writer = csv.writer(write_f)

    with open(in_csv,'rb') as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            new_row = []
            for s in row:
                for k, v in unicode_dict.iteritems():
                    s = s.replace(k, v)
                new_row.append(s)
            writer.writerow(new_row)

    write_f.close()
    os.remove(in_csv)
    os.rename(out_csv, in_csv)

Then I need to run the code again, get another error and find the next Unicode character in Google. There must be a better way, right?

+4
2

http://www.joelonsoftware.com/articles/Unicode.html. .

, , . , \xbd, , - , .

io.open(in_csv, 'rb', encoding='yourencodinghere') open.

, -, csv Unicode, . - SBillion (, http://www.joelonsoftware.com/articles/Unicode.html), .

+3

Source: https://habr.com/ru/post/1538785/


All Articles