Pandas.algos._return_false raises a PicklingError with dill.dump_session on CentOS

I have a code structure that includes dumping sessions with dill. This worked fine until I started using pandas. The following code triggers the release of PicklingError on CentOS 6.5:

import pandas import dill dill.dump_session('x.dat') 

The problem seems to be related to pandas.algos. In fact, this is enough to run this to reproduce the error:

 import pandas.algos import dill dill.dump_session('x.dat') / dill.dumps(pandas.algos) 

Error pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x1df3050>: it not found as pandas.algos.lambda1 .

The fact is that this error does not occur on my computer. Both of them have the same versions of pandas (0.14.1), dill (0.2.1) and python (2.7.6).

Looking at badobjects, I get:

 >>> dill.detect.badobjects(pandas.algos, depth = 1) {'__builtins__': <module '__builtin__' (built-in)>, '_return_true': <cyfunction lambda2 at 0x1484d70>, 'np': <module 'numpy' from '/usr/local/lib/python2.7/site-packages/numpy-1.8.2-py2.7-linux-x86_64.egg/numpy/__init__.pyc'>, '_return_false': <cyfunction lambda1 at 0x1484cc8>, 'lib': <module 'pandas.lib' from '/home/talkr/.local/lib/python2.7/site-packages/pandas/lib.so'>} 

This seems to be due to the different processing of pandas.algos two OS-s (maybe different compilers?). On my PC, where dump_session is error free, pandas.algos._return_false has <cyfunction <lambda> at 0x06DD02A0> , and on CentOS it has <cyfunction lambda1 at 0x1df3050> . Why is this happening differently?

+6
source share
1 answer

I do not see what you see on Mac. Here is what I see using the same version of pandas . I see that you are using a different version of dill . I am using the version from github. I will check if there is a way to save modules or global variables in a dill that could affect some distributions.

 Python 2.7.8 (default, Jul 13 2014, 02:29:54) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas >>> import dill >>> dill.detect.trace(True) >>> dill.dump_session('x.pkl') M1: <module '__main__' (built-in)> F2: <function _import_module at 0x1069ff140> D2: <dict object at 0x106a0b280> M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/__init__.pyc'> M2: <module 'pandas' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/__init__.pyc'> 

Here is what I get for pandas.algos ,

 Python 2.7.8 (default, Jul 13 2014, 02:29:54) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas.algos >>> import dill >>> dill.dumps(pandas.algos) '\x80\x02cdill.dill\n_import_module\nq\x00U\x0cpandas.algosq\x01\x85q\x02Rq\x03.' 

Here is what I get for pandas.algos._return_false :

 Python 2.7.8 (default, Jul 13 2014, 02:29:54) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import dill >>> import pandas.algos >>> dill.dumps(pandas.algos._return_false) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 180, in dumps dump(obj, file, protocol, byref, file_mode, safeio) File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 173, in dump pik.dump(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 317, in save self.save_global(obj, rv) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global (obj, module, name)) pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x10d403cc8>: it not found as pandas.algos.lambda1 

So now I can reproduce your error.

It looks like an unusual object, based on its construction. However, it should be able to pickle inside the module ... as for me. It seems you have exactly pointed out the difference between what you see in the pandas object, based on CentOS.

Looking at pandas codebase, pandas.algos is a pyx file ... so cython . And here is the code.

 _return_false = lambda self, other: False 

Whether it was in a .py file, I know that it will be serialized. I don't know how dill works for cython generated lambdas ... (e.g. lambda cyfunction ).

It seems like there was a commit ( https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff ) in which _return_false was moved outside the class to the module scope. Do you see this on both CentOS and your PC? It is possible that v0.14.1 for different distributions was cut off by slightly different versions of git ... depending on how you installed pandas.

So, I can pick up lambda1 , trying to get the source of the object ... which for lambda, if it cannot get the source, the dill will capture by name ... and apparently it is called lambda1 ... although this does not appear in .pyx file. Perhaps this is due to the way cython builds lambdas.

 Python 2.7.8 (default, Jul 13 2014, 02:29:54) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pandas.algos >>> import dill >>> dill.source.importable(pandas.algos._return_false) 'from pandas import lambda1\n' 

The difference may come from cython ... since the code is generated from .pyx in pandas . What are your versions of cython ? Mine - 0.20.2.

+5
source

Source: https://habr.com/ru/post/974377/


All Articles