Python Lists with Scandinavian Letters

Can someone explain what causes this to better understand the environment?

emacs unix

input:

with open("example.txt", "r") as f: for files in f: print files split = files.split() print split 

output:

 Hello world ['Hello', 'world'] Hello wörld ['Hello', 'w\xf6rld'] 
+4
source share
3 answers

Python prints a string representation that includes non-printable bytes. Non-printable bytes (anything outside the ASCII range or control character) are displayed as an escape sequence.

The point is, you can copy this view and paste it into Python code or into the interpreter, creating the same value.

escape code \xf6 is a byte with the hexadecimal value F6, which when interpreted as byte-1 byte is the symbol ö .

You probably want to decode this value in Unicode for continuous data processing. If you still do not know what Unicode is, or want to know something else about encodings, see:

+10
source

In python, lists are simply printed using Unicode encoding. Basically printing the list makes the list call __repr__ for each item (which leads to Unicode printing for the lines). If you print each element separately (in this case, the __str__ line method is used, not the list), you get what you expect.

 with open("example.txt", "r") as f: for inp in f: files = inp.decode('latin-1') // just to make sure this works on different systems print files split = files.split() print split print split[0] print split[1] 

Output:

 hello world [u'hello', u'world'] hello world hello wörld [u'hello', u'w\xf6rld'] hello wörld 
+2
source

Python-mode.el

After adapting print forms to Python3

py-execute-buffer-python3

prints beautifully:

Hello World

 ['Hello', 'world'] 

Hello wörld

 ['Hello', 'wörld'] 
0
source

Source: https://habr.com/ru/post/1489084/


All Articles