Python 2.7 encoding decoding

I have a problem with encoding / decoding. I read text from a file and compare it with text from a database (Postgres). The comparison is performed in two lists.

from the file I get "jo \ x9a" for "joลก", and from the database I get "jo \ xc5 \ xa1" for the same value

common = [a for a in codes_from_file if a in kode_prfoksov] # Items in one but not the other only1 = [a for a in codes_from_file if not a in kode_prfoksov] #Items only in another only2 = [a for a in kode_prfoksov if not a in codes_from_file ] 

How to solve this? What encoding should be set when comparing these two lines to solve the problem?

Thank you

+4
source share
2 answers

Your file lines seem to be encoded in Windows-1250. Your database seems to contain UTF-8 rows.

So you can convert all strings to unicode first:

 codes_from_file = [a.decode("windows-1250") for a in codes_from_file] kode_prfoksov] = [a.decode("utf-8") for a in codes_from_file] 

or if you don't want a unicode string, just convert the file string to UTF-8:

 codes_from_file = [a.decode("windows-1250").encode("utf-8") for a in codes_from_file] 
+4
source

The first seems to be windows-1250 , and the second is utf-8 .

 >>> print 'jo\x9a'.decode('windows-1250') joลก >>> print 'jo\xc5\xa1'.decode('utf-8') joลก >>> 'jo\x9a'.decode('windows-1250') == 'jo\xc5\xa1'.decode('utf-8') True 
+4
source

Source: https://habr.com/ru/post/1402619/


All Articles