Python: Unicode and "\ xe2 \ x80 \ x99" make me batty

So, I have a .txt file from Google Docs containing some lines from David Foster Wallace "Oblivion". Using:

with open("oblivion.txt", "r", 0) as bookFile:
    wordList = []
    for line in bookFile:
        wordList.append(line)

and returning and printing the list of words, I get:

"surgery on the crow\xe2\x80\x99s feet around her eyes." 

(and it truncates a lot of text). However, if instead of adding wordList, I simply

for line in bookFile:
    print line

everything turns out fine! The same goes for the .read () 'file - the resulting str does not have a crazy byte representation, but then I cannot manipulate it the way I want.

Where am I .encode () or .decode () or what? Using Python 2 because 3 gave me some I / O buffer error. Thank.

+7
2

open encoding utf-8:

with open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()
+11

Python 2 Rahul

import io
with io.open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()
0

Source: https://habr.com/ru/post/1680547/


All Articles