I am working on a Python script that reads an XML file encoded using UTF-8, performs some manipulations with it, and saves it to Google Datastore (this is an App Engine program).
The way I read and parse files is just with file.readline () and a few regexes. The only problem is that in the file I'm working with, there are characters from different languages, for example, it can have characters é or Å or Russian or Greek.
At first I got this error: "UnicodeDecodeError: ascii codec cannot decode byte 0xd0 at position 0: serial number is not in the range (128)." Then I tried to switch the encoding to a file opened in "ISO-8859-15", which gets rid of the error, but the displayed characters are not displayed correctly.
So my question is: how do I work with a file encoded in UTF-8 in Python without Python getting stuck in all the special characters in the file? I hope this was clear enough, and well in advance for any advice.
source share