Python UTF-8 cannot decode bytes on a 32-bit machine

it works fine on 64-bit machines, but for some reason will not work on python 2.4.3 on a 32-bit instance.

I get an error

'utf8' codec can't decode bytes in position 76-79: invalid data 

for code

 try: str(sourceresult.sourcename).encode('utf8','replace') except: raise Exception( repr(sourceresult.sourcename ) ) 

he returns 'kazamidori blog \ xf9'

I modified the site.py file to make UTF8 the default encoding, but it doesn't seem to work.

+4
source share
4 answers

We need the following, and we need the exact conclusion:

 type(sourceresult.sourcename) # I suspect it already a UTF-8 encoded string repr(sourceresult.sourcename) 

As I said, I'm pretty sure that your sourceresult.sourcename already UTF-8 encoded.

Maybe this one might help a bit.

EDIT: It seems your sourceresult.sourcename encoded as cp1252. I do not know what mystring (which you indicate in the comment) is. So, to get the UTF-8 encoded string, you need to do:

 source_as_UTF8= sourceresult.sourcename.decode("cp1252").encode("utf-8") 

However, the string encoded by cp1252 is not consistent with the error message you specified.

+7
source

"Invalid data" usually means that the input contains characters outside its character set.

This is often due to the fact that in some cases some data is encoded in a character set other than UTF-8.

For example, if the file that is stored in the string was not converted to UTF-8 when you made UTF-8 the standard character set. (On Windows, you can usually specify the file encoding in the Save As ... dialog box of your text editor)

Or, when the data comes from a database that uses a different character set in both tables, a join or both.

Check where the data comes from and what encodings are installed on this path.

0
source

I think the problem is that you are using the str () function. Keep in mind that str () returns narrow, i.e. Strings with 1 byte per character. If the input, sourceresult.sourcename, is unicode, then Python will automatically encode it to return a narrow string. By default, a system encoding is used for this, which is probably something like ISO-8859-1.

So, you get an error because it makes no sense to call an encoding on an already encoded string. If you get rid of str (), it should work.

0
source

Make sure there are no odd number of bytes in the varchar field; I had a cook (255) that exploded when someone entered a long line in Arabic. Then I got an โ€œunexpected endโ€ error (as you would expect ...!)

0
source

Source: https://habr.com/ru/post/1305792/


All Articles