Does lexical coverage have a dynamic dimension?

It seems like a common occurrence to access the lexical area at compile time (or a static analyzer, as my example is in Python), based simply on a location in the source code.

Here is a very simple example when one function has two closures with different values ​​for a .

 def elvis(a): def f(s): return a + ' for the ' + s return f f1 = elvis('one') f2 = elvis('two') print f1('money'), f2('show') 

I have no problem with the fact that when we read the code for the function f , when we see a , it is not defined in f , so we go to the closing function and find one there, and that is what a in f refers to. The location in the source code is enough to tell me that f gets the value for a from the closing area.

But, as described here , when a function is called, its local frame expands its parent environment. Thus, viewing the environment at runtime is not a problem. But I'm not sure that the static analyzer can always decide which closure is referenced at compile time before the code runs. In the above example, it’s obvious that elvis has two closures and is easy to track, but other cases will not be so simple. Intuitively, I am nervous that attempting a static analysis may run into the problem of stopping altogether.

Thus, lexical possession does have a dynamic aspect, where the location in the source code tells us that the scope is involved, but not necessarily, what closure does it apply to? Or is this a solvable problem in compilers, and all links in functions to their closures can really be statically detailed in detail?

Or the answer depends on the programming language - in this case, the lexical field of view is not as strong as I thought?

[EDIT @comments:

In my example, I can repeat my question: I read statements such as “Lexical permission can be determined at compile time”, but I wondered how they could refer to the values ​​of a in f1 and f2 statically / at compile time (in general).

The solution is that the lexical outline does not require so much. L.S. can tell us, at compile time, that what is called a will be determined whenever I am in f (and this can explicitly be generated statically, this is a definition of the lexical domain), but by determining what value it actually takes (or closing of which is active): 1) outside LS, the concept, 2) is executed at runtime (not statically), therefore it is dynamic in a sense, but, of course, 3) it uses a rule different from dynamic coverage.

The withdrawal message quoted by @PatrickMaupin reads: "Some dynamic work has yet to be completed." ]

+5
source share
3 answers

Closing can be implemented in several ways. One of them is to actually capture environments ... in other words, consider an example

 def foo(x): y = 1 z = 2 def bar(a): return (x, y, a) return bar 

The env-capture solution is as follows:

  • foo , and a local frame is created containing the names x , y , z , bar . The name x bound to the parameter, the name y and z to 1 and 2, the name bar to close
  • The closure assigned to bar actually captures the entire parent frame, so when it calls it, it can look for the name a in its own local frame and can look for x and y instead in the captured parent frame.

With this approach (that is, not the approach used by Python), the variable z will remain alive as long as the closure remains alive, even if the closure does not refer to it.

Another option, a little more difficult to implement, is instead:

  • during compilation, the code is parsed, and the closure assigned to bar is detected when writing the names x and y from the current area.
  • these two variables are therefore classified as “cells” and they are allocated separately from the local frame
  • the closure stores the address of these variables, and each access to them requires double access (the cell is a pointer to where the value is actually stored).

To do this, you need to pay a little extra time when the closure is created, because each individual captured cell must be copied inside the closure object (instead of simply copying the pointer to the parent frame), but has the advantage of not capturing the integer so, for example, z does not will survive after returning foo , only x and y will be.

This is what Python does ... basically at compile time, when a closure (or a named function or lambda ) is detected, subcompilation is performed. At compile time, when there is a search that resolves the parent function, the variable is marked as a cell.

One small annoyance is that when you take a parameter (for example, in the foo example) in the prolog, you need to perform an additional copy operation to convert the passed value to a cell. This in Python is not displayed in bytecode, but is executed directly by the calling machine.

Another annoyance is that every access to the captured variable requires double indirectness, even in the parent context.

The advantage is that closures only capture really bound variables, and when they do not capture any generated code as efficiently as a regular function.

To find out how this works in Python, you can use the dis module to check the generated bytecode:

 >>> dis.dis(foo) 2 0 LOAD_CONST 1 (1) 3 STORE_DEREF 1 (y) 3 6 LOAD_CONST 2 (2) 9 STORE_FAST 1 (z) 4 12 LOAD_CLOSURE 0 (x) 15 LOAD_CLOSURE 1 (y) 18 BUILD_TUPLE 2 21 LOAD_CONST 3 (<code object bar at 0x7f6ff6582270, file "<stdin>", line 4>) 24 LOAD_CONST 4 ('foo.<locals>.bar') 27 MAKE_CLOSURE 0 30 STORE_FAST 2 (bar) 6 33 LOAD_FAST 2 (bar) 36 RETURN_VALUE >>> 

as you can see that the generated code stores 1 in y using STORE_DEREF (an operation that writes to the cell using double indirectness) and instead stores 2 in z using STORE_FAST ( z not captured and is only local in the current frame ) When the code foo starts executing, x has already been wrapped in a cell by the calling machine.

bar is only a local variable, therefore STORE_FAST used to write to it, but to create a closure, x and y must be copied individually (they are placed in the tuple before calling the MAKE_CLOSURE operation MAKE_CLOSURE ).

The closing code can be seen with:

 >>> dis.dis(foo(12)) 5 0 LOAD_DEREF 0 (x) 3 LOAD_DEREF 1 (y) 6 LOAD_FAST 0 (a) 9 BUILD_TUPLE 3 12 RETURN_VALUE 

and you can see that inside the returned closure, x and y are accessible with LOAD_DEREF . Regardless of how many “up” levels in the hierarchy of nested functions are determined by the variable, this is really just double feedback, because the price is paid when constructing the closure. Closed variables are only a little slower for access (by a constant coefficient) with respect to local networks ... no “scope chain” should intersect at run time.

Compilers that are even more complex, such as SBCL (an optimizing compiler for Common Lisp that generates native code), also perform "escape analysis" to determine if the closure can actually survive in the closing function. When this does not happen (that is, if bar used only inside foo and is not saved or not returned), the cells can be allocated on the stack instead of the heap, reducing the amount of consing time (allocating objects on the heap that requires garbage disposal).

This difference is found in the literature known as “descending / ascending”; those. if the captured variables are visible only at lower levels (i.e., in the closure or in deeper closures created inside the closure) or also at the upper levels (i.e. if my caller can access my captured locals).

To solve the problem of the rising lamp, you need a garbage collector, and why C ++ closing does not provide this opportunity.

+5
source

Solved the problem ... anyway. Python uses purely lexical scope, and closure is defined statically. Other languages ​​allow dynamic scaling - and closure is determined at run time, looking for its way up the call stack at run time instead of the analysis stack.

Is this enough explanation?

+1
source

In Python, a variable is defined as local if it is ever assigned (displayed in the LHS assignment) and is not explicitly declared global or nonlocal.

Thus, it is possible to process the lexical chain of regions to statically determine which identifier will be found in which function. However, you need to do some dynamic work, because you can arbitrarily nest functions, so if function A includes function B, which includes function C, then for function C to access the variable from function A you must find the correct frame for A. ( Same for closing.)

+1
source

Source: https://habr.com/ru/post/1232060/


All Articles