Python: How to read and parse an unfode utf-8 text file?

I am exporting UTF-8 text from Excel and I want to read and parse incoming data using Python. I read all the online information, so I already tried this, for example:

 txtFile = codecs.open( 'halout.txt', 'r', 'utf-8' )
 for line in txtFile:
  print repr( line )

The error I am getting is:

UnicodeDecodeError: codec 'utf8' cannot decode byte 0xff at position 0: unexpected byte code

Looking at the text file in the Hex editor, I also tried the first FFFE values:

txtFile.seek( 2 )

right after "open", but it just causes a different error.

+3
source share
4 answers

This is a specification.

EDIT, from comments, this seems to be utf-16 bom

codecs.open('foo.txt', 'r', 'utf-16')

must work.

+2

UTF-8; UTF-16LE .

+5

Expanding Jonathan's comment, this code should read the file correctly:

import codecs
txtFile = codecs.open( 'halout.txt', 'r', 'utf-16' )
for line in txtFile:
   print repr( line )
+1
source

Try to see if there are several blank lines in the excel file (and then the values ​​again), which may cause an unexpected error.

0
source

Source: https://habr.com/ru/post/1724994/


All Articles