You can use the == operator to compare Unicode objects for equality.
>>> s1 = u'Hello' >>> s2 = unicode("Hello") >>> type(s1), type(s2) (<type 'unicode'>, <type 'unicode'>) >>> s1==s2 True >>> >>> s3='Hello'.decode('utf-8') >>> type(s3) <type 'unicode'> >>> s1==s3 True >>>
But your error message indicates that you are not comparing unicode objects. You are probably comparing a unicode object with a str object, for example:
>>> u'Hello' == 'Hello' True >>> u'Hello' == '\x81\x01' __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False
See how I tried to compare a unicode object with a string that does not represent valid UTF8 encoding.
Your program, I suppose, compares unicode objects with str objects, and the contents of the str object are not valid UTF8 encoding. This seems like a likely result of the fact that you (the programmer) do not know which variable contains unicide, which variable contains UTF8 and which variable contains bytes read from the file.
I recommend http://nedbatchelder.com/text/unipain.html , especially the advice on creating a Unicode sandwich.
Robα΅© Aug 12 '13 at 18:43 2013-08-12 18:43
source share