Python: Unicode and "\ xe2 \ x80 \ x99" make me batty

Question

Python: Unicode and "\ xe2 \ x80 \ x99" make me batty

So, I have a .txt file from Google Docs containing some lines from David Foster Wallace "Oblivion". Using:

with open("oblivion.txt", "r", 0) as bookFile:
    wordList = []
    for line in bookFile:
        wordList.append(line)

and returning and printing the list of words, I get:

"surgery on the crow\xe2\x80\x99s feet around her eyes."

(and it truncates a lot of text). However, if instead of adding wordList, I simply

for line in bookFile:
    print line

everything turns out fine! The same goes for the .read () 'file - the resulting str does not have a crazy byte representation, but then I cannot manipulate it the way I want.

Where am I .encode () or .decode () or what? ~~Using Python 2 because 3 gave me some I / O buffer error.~~ Thank.

+7

python unicode character-encoding

Luke mcpuke Jul 01 '17 at 10:48

2

Python 2 Rahul

import io
with io.open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()

0

CodeMonkey 24 . '19 11:39

Rahul · Accepted Answer · 2017-07-01T10:51:02+0000

open encoding utf-8:

with open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()

Python: Unicode and "\ xe2 \ x80 \ x99" make me batty

More articles: