Inconsistent cPickle

Why does the following code generate an error when running as a script? It does not create an error when launched in the interactive shell (cut-paste).

import cPickle as pickle val1 = dict(fooblah=[], xy=[]) pickval1 = pickle.dumps(val1, protocol=2) val2 = pickle.loads(pickval1) assert val1 == val2 pickval2 = pickle.dumps(val2, protocol=2) assert pickval1 == pickval2, (pickval1, pickval2) 

The difference in the pickles is below:

 $ python /tmp/picklefun.py Traceback (most recent call last): File "/tmp/picklefun.py", line 10, in <module> assert pickval1 == pickval2, (pickval1, pickval2) AssertionError: ('\x80\x02}q\x01(U\x07fooblahq\x02]U\x02xyq\x03]u.', '\x80\x02}q\x01(U\x07fooblah]U\x02xy]u.') 
+4
source share
2 answers

If you replace the line

 val1 = dict(fooblah=[], xy=[]) 

with

 exec "val1 = dict(fooblah=[], xy=[])" 

Then the statements pass again.

Why?? The answer lies deep inside cPickle's secrets. It has an optimization that looks like if some objects have a reference count less than 2, and avoids several bytes in this case (usually used to detect loops or multiple occurrences of the same, possibly large string). These are the string objects "fooblah" and "xy". In the case of exec or when starting interactively, by the time you compile, the only links left for the lines are in the dictionary; the control counter is 1, so cPickle avoids several bytes. But if you write the example as a module, then the module is still alive at that time, and it retains another link to the strings used as constants.

EDIT to clarify: the second time we pickled, we will analyze the dictionary, which always has fresh keys coming from the counter of loose links 1. Thus, the statement is transmitted if and only if the keys, where there is also a reference counter 1 in the first time.

+2
source

This seems to be caused by cPickle because it does not happen using a simple pickle start (I was able to reproduce your error).

That is why, Level 1 ... I will continue to research, because it is an interesting find!

Update:

CPickle documents (e.g. footnotes) guarantee that objects will always be / read / correctly, but it is not guaranteed (or reserved) that serialized data is always equal. This is probably not an unexpected behavior, but worth noting.

http://docs.python.org/2/library/pickle.html#module-cPickle

+1
source

Source: https://habr.com/ru/post/1479637/


All Articles