Unicode error, correct decoding / coding string in python

I am using BeautifulSoup and I am returning a string like this:

u'Dassault Myst\xe8re' 

This is unicode, but I want it to look like this:

 'Dassault Mystère' 

I tried

 name = name.encode('utf-8'), decode(), unicode() 

The error I am getting is:

 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' 

My default encoding seems to be "ascii": sys.getdefaultencoding () returns "ascii", although I have:

 #!/usr/bin/env python # encoding: utf-8 

At the top of the file.

Hoping to solve this recurring Unicode problem once and for all!

thanks

+4
source share
1 answer

I do not know how and where you will receive this message, but look at this example:

 $ python Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> txt = u'Dassault Myst\xe8re' >>> txt u'Dassault Myst\xe8re' >>> print txt Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 13: ordinal not in range(128) >>> ^D $ export LANG=en_US.UTF-8 $ python Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> txt = u'Dassault Myst\xe8re' >>> txt u'Dassault Myst\xe8re' >>> print txt Dassault Mystère >>>^D 

So, as you can see if you have a console like ASCII, then during printing the conversion from unicode to ascii occurs, and if there is a character outside the ASCII region, an exception is thrown.

But if the console can accept unicode, then everything will display correctly.

+1
source

Source: https://habr.com/ru/post/1343483/


All Articles