Python error; UnicodeEncodeError: codec 'ascii' cannot encode character u '\ u2026'

I am trying to extract some data from a JSON file that contains tweets and write it to csv. The file contains all kinds of characters, I assume that is why I get this error message:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'

I think I need to convert the output to utf-8 before writing the csv file, but I could not do it. I found similar questions here in stackoverflow, but could not adapt the solutions to my problem (I must add that I am not very familiar with python. I am a sociologist, not a programmer)

import csv
import json

fieldnames = ['id', 'text']

with open('MY_SOURCE_FILE', 'r') as f, open('MY_OUTPUT', 'a') as out:

    writer = csv.DictWriter(
                    out, fieldnames=fieldnames, delimiter=',', quoting=csv.QUOTE_ALL)

    for line in f:
        tweet = json.loads(line)
        user = tweet['user']
        output = {
            'text': tweet['text'],
            'id': tweet['id'],
        }
        writer.writerow(output)
+4
source share
1 answer

You just need to encode the text in utf-8:

for line in f:
    tweet = json.loads(line)
    user = tweet['user']
    output = {
        'text': tweet['text'].encode("utf-8"),
        'id': tweet['id'],
    }
    writer.writerow(output)

csv python2:

. csv Unicode. , , NULL ASCII. , UTF-8 ASCII , ; . .

+6

Source: https://habr.com/ru/post/1584721/


All Articles