UnicodeDecodeError in python3

Question

UnicodeDecodeError in python3

I'm currently trying to use some simple regular expression in a very large .txt file (a couple of million lines of text). The simplest code that causes the problem:

file = open("exampleFileName", "r")  
    for line in file:  
        pass

Error message:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    example()
  File "example.py", line 16, in example
    for line in file:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7332: invalid continuation byte

How can i fix this? Is utf-8 the wrong encoding? And if so, how do I know which one is right?

Thanks and best regards!

+4

python regex utf-8 decoding

Elitekaffee Aug 17 '16 at 16:15

source share

2 answers

" ". , , , ( ), :

try:
    file = open("exampleFileName", "r")
except UnicodeDecodeError:
    try:
        file = open("exampleFileName", "r", encoding="latin2")
    except: #...

, Python.

, , , file -bi [filename] , .

UPD. , fooobar.com/questions/109862/..., , Windows.

0

light2yellow 17 . '16 16:40

mic4ael · Accepted Answer · 2016-08-17T16:25:33+0000

It seems like this is unacceptable to UTF-8, and you should try reading with encoding latin-1. Try

file = open('exampleFileName', 'r', encoding='latin-1')

UnicodeDecodeError in python3

More articles: