Python file input line: how to handle Unicode escaped characters?

Question

Python file input line: how to handle Unicode escaped characters?

In a text file (test.txt) my line looks like this:

Gro\u00DFbritannien

Reading, python speeds up the backslash:

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'

How can this be interpreted as unicode? decode()and unicode()will not complete the task.

The following code writes Gro\u00DFbritannienback to the file, but I want it to beGroßbritannien

>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)

+3

python unicode utf-8 decode

Michi May 11, '10 at 13:44

source share

2 answers

Alex martelli · Answer 1 · 2010-05-11T14:11:33+0000

Do you want to use the codec unicode_escape:

>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien

See documents for a large number of standard encodings that are part of the Python standard library.

Jacek Konieczny · Answer 2 · 2010-05-11T14:07:25+0000

'unicode_escape':

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'

codecs.open():

>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'

Python: http://docs.python.org/library/codecs.html#standard-encodings

Python file input line: how to handle Unicode escaped characters?

More articles: