With the exception of Python codec errors?

File "/usr/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte

Hi, I am getting this exception. How do I catch him and keep reading my files when I get this exception.

My program has a loop that reads a text file in turn and tries to do some processing. However, some files that I encounter may not be text files or have lines that are not properly formatted (foreign language, etc.). I want to ignore these lines.

Does not work

for line in sys.stdin:
   if line != "":
      try:
         matched = re.match(searchstuff, line, re.IGNORECASE)
         print (matched)
      except UnicodeDecodeError, UnicodeEncodeError:
         continue
+3
source share
1 answer

Take a look at http://docs.python.org/py3k/library/codecs.html . When you open the codec stream, you probably want to use an extra argumenterrors='ignore'

Python 3, sys.stdin (. http://docs.python.org/py3k/library/sys.html) .

utf-8, . - :

sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')
+6

Source: https://habr.com/ru/post/1782587/


All Articles