I have this myfile (which I pasted, I hope that the corresponding data with the problems remained in the copy / paste). I am trying to read this file with:
import codecs codecs.open('myfile', 'r', 'utf-8').read()
But it gives:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 7128: invalid continuation byte
If I check the file:
Β» file myfile myfile: C source, ISO-8859 text
- How can I read this file (ISO-8859) in python?
- In general, how can I find out how a file is encoded?
Many times I deal with files that were not generated by me (system files, random files downloaded from the Internet, random files provided by suppliers, customers, ...): these files do not give which they use. Being in a multicultural environment (Europe), it is difficult to understand how these files were encoded. In most cases, even the person providing the files does not have a clue about the coding, which can happen behind the scenes with the help of an editor / selection tool. How to be sure that the encoding used is file-based?
source share