Unable to decode Unicode string in Python 2.4

Question

Unable to decode Unicode string in Python 2.4

This is in python 2.4. Here is my situation. I pull the row from the database and it contains umlauted 'o' (\ xf6). At this point, if I run the type (value), it will return str. Then I try to run .decode ('utf-8') and I get an error message ('utf8' codec cannot decode bytes at positions 1-4).

Indeed my goal here is simply to successfully make the type (value) return unicode. I found an earlier question that had some useful information, but the example from the selected answer does not seem to work for me. Am I doing something wrong here?

Here is the code to play:

Name = 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %s - %s\n' %(Name, type(Name)))

I never get a write statement because it does not work in the first expression.

Thank you for your help.

Edit:

I checked that the DB encoding is utf8. So in my code for playback, I changed '\ xf6' to '\ xc3 \ xb6' and the error still occurs. Is there a difference between utf-8 and utf8?

The advice on using codecs to write to a file is convenient (I will definitely use it), but in this scenario I only write to the log file for debugging purposes.

+3

python unicode decode

Rob lund Mar 20 '09 at 14:36

source share

4 answers

UTF8. "" unicode, , . , :

print 'w\xf6rner'.decode('cp1250')

unicode :

import codecs
f = codecs.open("yourfile.txt", "w", "utf8")
f.write( ... )

/ "unicode" , .

+10

Jiri 20 . '09 14:43

This is obviously a 1-byte encoding. 'ö' in UTF-8 is '\ xc3 \ xb6'.

Encoding can be:

ISO-8859-1
ISO-8859-2
ISO-8859-13
ISO-8859-15
Win-1250
Win-1252

+5

vartec Mar 20 '09 at 14:55

source share

You need to use "ISO-8859-1":

Name = 'w\xf6rner'.decode('iso-8859-1')
file.write('Name: %s - %s\n' %(Name, type(Name)))

utf-8 uses 2 bytes to escape something outside ascii, but here it is only 1 byte, so iso-8859-1 is probably correct.

+3

Staale Mar 20 '09 at 14:41

source share

bobince · Accepted Answer · 2009-03-20T16:01:34+0000

So, in my playback code, I changed '\ xf6' to '\ xc3 \ xb6' and it still crashes

The first line does not:

>>> 'w\xc3\xb6rner'.decode('utf-8')
u'w\xf6rner'

The second line will be an error:

>>> file.write('Name: %s - %s\n' %(Name, type(Name)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 7: ordinal not in range(128)

, , Unicode ASCII . Jiri , Unicode, Unicode .

repr(). , Unicode , :

name= 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %r\n' % name)

Name: u'w\xf6rner'

Unable to decode Unicode string in Python 2.4

More articles: