To never damage Unicode errors, go to python3:
% python3 >>> with open('/tmp/foo', 'w') as f: ... value = "Bitte überprüfen" ... f.write(('"{}" = "{}";\n'.format('no_internet', value))) ... 36 >>> import sys >>> sys.exit(0) % cat /tmp/foo "no_internet" = "Bitte überprüfen";
although if you are really attached to python2 and have no choice:
% python2 >>> with open('/tmp/foo2', 'w') as f: ... value = u"Bitte überprüfen" ... f.write(('"{}" = "{}";\n'.format('no_internet', value.encode('utf-8')))) ... >>> import sys >>> sys.exit(0) % cat /tmp/foo2 "no_internet" = "Bitte überprüfen";
And as @JuniorCompressor suggests, don't forget to add # encoding: utf-8 at the beginning of your python2 file to tell python to read the source file in unicode, not ASCII!
Your mistake:
f.write(("\"%s\" = \"%s\";\n" % ("no_internet", value)).encode("utf-8"))
is that you encode the entire formatted string in utf-8, while you must encode the string of values in utf-8 before executing the format:
>>> with open('/tmp/foo2', 'w') as f: ... value = u"Bitte überprüfen" ... f.write(('"{}" = "{}";\n'.format('no_internet', value).encode('utf-8'))) ... Traceback (most recent call last): File "<stdin>", line 3, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 6: ordinal not in range(128)
Which is because python has to decode the string in utf-8 , so you need to use the unicode type (which is what u"" does). Then you need to explicitly decode this value as unicode before feeding it to the format parser to build a new line.
As Carl says in his answer, Python2 is completely messy / flawed when using Unicode strings, defeating Explicit is better than implicit zen python. And for a weirder behavior, the following works fine in python2:
>>> value = "Bitte überprüfen" >>> out = '"{}" = "{}";\n'.format('no_internet', value) >>> out '"no_internet" = "Bitte \xc3\xbcberpr\xc3\xbcfen";\n' >>> print(out) "no_internet" = "Bitte überprüfen";
Not sure to switch to python3? :-)
Update:
This is the way to go about reading and writing a Unicode string from a file to another file:
% echo "Bitte überprüfen" > /tmp/foobar % python2 >>> with open('/tmp/foobar', 'r') as f: ... data = f.read().decode('utf-8').strip() ... >>> >>> with open('/tmp/foo2', 'w') as f: ... f.write(('"{}" = "{}";\n'.format('no_internet', data.encode('utf-8')))) ... >>> import sys;sys.exit(0) % cat /tmp/foo2 "no_internet" = "Bitte überprüfen";
Update:
as a general rule:
- when you get
DecodeError you should use .decode('utf-8') in a string containing unicode data and - when you get an
EncodeError you should use .encode('utf-8') in a line containing unicode data li>
Update: if you cannot upgrade to python3, you can at least make your python2 behave as if it is almost python3 using the following python-future import:
from __future__ import absolute_import, division, print_function, unicode_literals
NTN