Python 2 newline modifications in the tokenization module

I use the tokenize module in Python and wonder why there are two different newline tokens:

 NEWLINE = 4 NL = 54 

Any code examples that will produce both tokens will be appreciated.

+6
source share
3 answers

According to python documentation:

tokenize.NL
The token value used to indicate a non-transient newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code continues along several physical lines.

More details here: https://docs.python.org/2/library/tokenize.html

+5
source

There are at least 4 possible cases of '\n' Python code; 2 of them are encoded with tokens:

  • A statement about the completion of a new line: tokenize.NEWLINE is a token more or less corresponding to C or Java ; .

  • Any new line that does not complete the statement and does not apply to cases 3 or 4: tokenize.NL .

  • Newlines in multiline lines.

  • A new line that occurs during the continuation of the line \ - contrary to what the documentation apparently indicates, this case does not create any token at all.

So the following example:

 # case 1 a = 6 b = 7 # case 2 answer = ( a * b ) # case 3 format = """ A multiline string """ # case 4 print "something that is continued" \ "on the following line." 

Gives all possible cases:

 1,0-1,8: COMMENT '# case 1' 1,8-1,9: NL '\n' 2,0-2,1: NAME 'a' 2,2-2,3: OP '=' 2,4-2,5: NUMBER '6' 2,5-2,6: NEWLINE '\n' 3,0-3,1: NAME 'b' 3,2-3,3: OP '=' 3,4-3,5: NUMBER '7' 3,5-3,6: NEWLINE '\n' 4,0-4,1: NL '\n' 5,0-5,8: COMMENT '# case 2' 5,8-5,9: NL '\n' 6,0-6,6: NAME 'answer' 6,7-6,8: OP '=' 6,9-6,10: OP '(' 6,10-6,11: NL '\n' 7,4-7,5: NAME 'a' 7,6-7,7: OP '*' 7,8-7,9: NAME 'b' 7,9-7,10: NL '\n' 8,0-8,1: OP ')' 8,1-8,2: NEWLINE '\n' 9,0-9,1: NL '\n' 10,0-10,8: COMMENT '# case 3' 10,8-10,9: NL '\n' 11,0-11,6: NAME 'format' 11,7-11,8: OP '=' 11,9-13,3: STRING '"""\nA multiline string\n"""' 13,3-13,4: NEWLINE '\n' 14,0-14,1: NL '\n' 15,0-15,8: COMMENT '# case 4' 15,8-15,9: NL '\n' 16,0-16,5: NAME 'print' 16,6-16,35: STRING '"something that is continued"' 17,4-17,28: STRING '"on the following line."' 17,28-17,29: NEWLINE '\n' 18,0-18,0: ENDMARKER '' 
+2
source

In addition to the quote from the documentation

The NEWLINE token indicates the end of a logical line of Python code; NL points are generated when a logical line of code extends across multiple physical lines.

here is an example

 def a_func(a, b): pass 

This will create

 1,0-1,3: NAME 'def' 1,4-1,10: NAME 'a_func' 1,10-1,11: OP '(' 1,11-1,12: NAME 'a' 1,12-1,13: OP ',' 1,14-1,15: NAME 'b' 1,15-1,16: OP ')' 1,16-1,17: OP ':' 1,17-1,18: NEWLINE '\n' 2,0-2,4: INDENT ' ' 2,4-2,8: NAME 'pass' 2,8-2,9: NEWLINE '\n' 3,0-3,0: DEDENT '' 

While

 def a_func(a, b): pass 

will generate this

 1,0-1,3: NAME 'def' 1,4-1,10: NAME 'a_func' 1,10-1,11: OP '(' 1,11-1,12: NAME 'a' 1,12-1,13: OP ',' 1,13-1,14: NL '\n' 2,11-2,12: NAME 'b' 2,12-2,13: OP ')' 2,13-2,14: OP ':' 2,14-2,15: NEWLINE '\n' 3,0-3,4: INDENT ' ' 3,4-3,8: NAME 'pass' 3,8-3,9: NEWLINE '\n' 4,0-4,0: DEDENT '' 4,0-4,0: ENDMARKER '' 

Note 1,13-1,14: NL '\n' after a,


Basically the difference between NEWLINE and NL is that NL is generated after a line that is not โ€œcompleteโ€:

 def a_func(a, b): 

leads to NEWLINE because the entire logical line is on 1 physical line

 def another_func(a, b) 

leads to NL because the code for this 1 logical line extends to 2 physical lines

+1
source

Source: https://habr.com/ru/post/971627/


All Articles