I am trying to write text and encode it as utf-8, where possible, using the following code:
outf.write((lang_name + "," + (script_name or "") + "\n").encode("utf-8", errors='replace'))
I get the following error:
File "C:\Python27\lib\encodings\cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6: character maps to <undefined>
I thought part of errors='replace'my coded call would handle this?
fwiw, I just open the file with
outf = open(outfile, 'w')
without explicit declaration of the encoding.
print repr(outf)
gives:
<open file 'myfile.csv', mode 'w' at 0x000000000315E930>
I separated the write statement into a separate concatenation, encoding, and file:
outstr = lang_name + "," + (script_name or "") + "\n"
encoded_outstr = outstr.encode("utf-8", errors='replace')
outf.write(encoded_outstr)
This is the concatenation that throws an exception.
Line through print repr(foo)
lang_name: 'G\xc4\x81ndh\xc4\x81r\xc4\xab'
script_name: u'Kharo\u1e63\u1e6dh\u012b'
Further detective work shows that I can link any of them to a simple ascii string without any difficulty - it puts them on one line that breaks things.