At some point, you will encounter problems when you encounter special characters, such as Chinese characters or emoticons, in the line you want to decode, that is, errors that look like this:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 109-123: ordinal not in range(128)
In my case (twitter data processing) I decoded as follows to allow me to see all characters without errors
>>> s = '\u003cfoo\u003e' >>> s.decode( 'unicode-escape' ).encode( 'utf-8' ) >>> <foo>
OkezieE Mar 29 '14 at 3:06 a.m. 2014-03-29 03:06
source share