Error message: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
says the 7th byte is 0xed . This is either the first byte of the UTF-8 sequence for some (possibly CJK) highly unicode Unicode character (which is completely inconsistent with the reported facts), or its i-sharp encoding in Latin1 or cp1252. I bet on cp1252.
If your file was encoded in UTF-8, the byte violation will not be 0xed , but 0xc3 :
Preliminaries: >>> import unicodedata >>> unicodedata.name(u'\xed') 'LATIN SMALL LETTER I WITH ACUTE' >>> uc = u'Diga s\xed por' What happens if file is encoded in UTF-8: >>> infile = uc.encode('utf8') >>> infile 'Diga s\xc3\xad por' >>> infile.encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128) #### NOT the message reported in the question #### What happens if file is encoded in cp1252 or latin1 or similar: >>> infile = uc.encode('cp1252') >>> infile 'Diga s\xed por' >>> infile.encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 6: ordinal not in range(128)
Having # -*- coding: utf-8 -*- at the beginning of your code, it does not magically guarantee that your file is encoded in UTF-8, before you and your text editor.
Actions:
- save the file as UTF-8.
- As suggested by others, you need u'blah l
source share