Python 2.X has some unnatural encoding handling that accepts an implicit conversion. It will try to play the unicode and no-unicode lines when the user does not complete the conversion. In the end, this does not solve the problem: coding must be confirmed by the developer from the very beginning. Python 2 just makes things less explicit and slightly less obvious.
>>> u'Γ¨'.replace(u"\xe0", u"") u'\xe8'
What is your original example, besides, I specifically told Python that all the lines were unicode. If you do not, Python will try to convert them. And since the default encoding in Python 2 is ASCII, this obviously will not work with your example.
Coding is a tricky question, but with some good habits (like early conversion, always being sure which data type the program is processing at a given point), usually (and I insist, KILL) is good.
Hope this helps!
source share