UnicodeDecodeError in python3

I'm currently trying to use some simple regular expression in a very large .txt file (a couple of million lines of text). The simplest code that causes the problem:

file = open("exampleFileName", "r")  
    for line in file:  
        pass

Error message:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    example()
  File "example.py", line 16, in example
    for line in file:
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7332: invalid continuation byte

How can i fix this? Is utf-8 the wrong encoding? And if so, how do I know which one is right?

Thanks and best regards!

+4
source share
2 answers

It seems like this is unacceptable to UTF-8, and you should try reading with encoding latin-1. Try

file = open('exampleFileName', 'r', encoding='latin-1') 
+5
source

" ". , , , ( ), :

try:
    file = open("exampleFileName", "r")
except UnicodeDecodeError:
    try:
        file = open("exampleFileName", "r", encoding="latin2")
    except: #...

, Python.

, , , file -bi [filename] , .

UPD. , fooobar.com/questions/109862/..., , Windows.

0

Source: https://habr.com/ru/post/1651572/


All Articles