In python, lists are simply printed using Unicode encoding. Basically printing the list makes the list call __repr__ for each item (which leads to Unicode printing for the lines). If you print each element separately (in this case, the __str__ line method is used, not the list), you get what you expect.
with open("example.txt", "r") as f: for inp in f: files = inp.decode('latin-1') // just to make sure this works on different systems print files split = files.split() print split print split[0] print split[1]
Output:
hello world [u'hello', u'world'] hello world hello wörld [u'hello', u'w\xf6rld'] hello wörld
source share