How can data remain constant across multiple calls to a decorated function?

The following function is intended for use as a decorator that stores the results of already calculated values. If the argument has already been calculated earlier, the function will return the value stored in the cache dictionary:

 def cached(f): f.cache = {} def _cachedf(*args): if args not in f.cache: f.cache[args] = f(*args) return f.cache[args] return _cachedf 

I realized (by mistake) that cache does not have to be an attribute of a function object. As a fact, the following code works:

 def cached(f): cache = {} # <---- not an attribute this time! def _cachedf(*args): if args not in cache: cache[args] = f(*args) return cache[args] return _cachedf 

I find it difficult to understand how the cache object is constant for multiple calls. I tried to run several cached functions several times and could not find any conflicts or problems.

Can someone help me understand how the cache variable still exists even after the _cachedf function _cachedf ?

+6
source share
2 answers

Here you create a closure of : function _cachedf() closes the variable cache from the application area. This saves the cache until the function object is running.

Change Perhaps I should add a few details about how this works in Python and how CPython implements this.

Take a look at a simpler example:

 def f(): a = [] def g(): a.append(1) return len(a) return g 

Interactive interpreter example

 >>> h = f() >>> h() 1 >>> h() 2 >>> h() 3 

When compiling a module containing the function f() , the compiler sees that the function g() refers to the name a in the enclosing area and remembers this external link in the code, the object corresponding to the function f() (in particular, it adds the name a to f.__code__.co_cellvars ).

So what happens when the f() function is called? First line, create a new list object and bind it to the name a . The next line creates a new function object (using the code object created during module compilation) and associates it with the name g . The body from g() is not currently executing, and finally the funciton object is returned.

Since the code object f() has a note that the name a is equally referencing local functions, a "cell" for this name is created when f() . This cell contains a link to the actual list, the object a bound, and the g() function gets a link to this cell. Thus, the object list and the cell are stored alive even when the function f() terminates.

+11
source

Can someone help me understand how the cache variable still exists even after the _cachedf function returns?

This is due to the count of the Python garbage collector. The cache variable will be saved and available because the _cachedf function has a link to it, and the caller cached has a link to it. When you call this function again, you are still using the same function object that was originally created, so you still have access to the cache.

You will not lose the cache until all links to it are destroyed. You can use the del operator to do this.

For instance:

 >>> import time >>> def cached(f): ... cache = {} # <---- not an attribute this time! ... def _cachedf(*args): ... if args not in cache: ... cache[args] = f(*args) ... return cache[args] ... return _cachedf ... ... >>> def foo(duration): ... time.sleep(duration) ... return True ... ... >>> bob = cached(foo) >>> bob(2) # Takes two seconds True >>> bob(2) # returns instantly True >>> del bob # Deletes reference to bob (aka _cachedf) which holds ref to cache >>> bob = cached(foo) >>> bob(2) # takes two seconds True >>> 

For the record, what you are trying to achieve is called Memoization , and there is a more complete memorable decorator available from the decorator template page , which does the same thing but uses the decorator class . Your class-based code and decorator are essentially the same, and the class-based decoder checks the hash ability before saving.


Edit (2017-02-02) : @SiminJie comments that cached(foo)(2) always takes a delay.

This is because cached(foo) returns a new function with a fresh cache. When cached(foo)(2) is called, a new fresh (empty) cache is created, and then the caching function is called immediately.

Since the cache is empty and will not find a value, it re-runs the underlying function. Instead, do cached_foo = cached(foo) , and then call cached_foo(2) several times. This will only delay the first call. Also, if it is used as a decorator, it will work as expected:

 @cached def my_long_function(arg1, arg2): return long_operation(arg1,arg2) my_long_function(1,2) # incurs delay my_long_function(1,2) # doesn't 

If you are new to decorators, check out this answer to understand what this code means.

+3
source

Source: https://habr.com/ru/post/922222/


All Articles