Why dill is much faster and more compact than pickle for multi-row arrays

I use Python 2.7 and NumPy 1.11.2, as well as the latest versions of dill (I just made pip install dill), on Ubuntu 16.04.

When storing a NumPy array using pickle, I find that the brine is very slow and stores arrays almost three times the “required” size.

For example, in the following code, pickle is about 50 times slower (1 s versus 50) and creates a 2.2 GB file instead of 800 MB.

 import numpy 
 import pickle
 import dill
 B=numpy.random.rand(10000,10000)
 with open('dill','wb') as fp:
    dill.dump(B,fp)
 with open('pickle','wb') as fp:
    pickle.dump(B,fp)

I thought dill was just a wrapper around the pickle. If this is true, is there a way I can improve pickling performance on my own? Is it generally not recommended to use a brine for NumPy arrays?

EDIT: using Python3, I get the same performance for pickleanddill

PS: numpy.save, , , , , .

+4
2

, ... , .

Python 2 0, - 2. Python 3 3, - 4 ( Python 3.6).

, 0 . , , Python. 2 .

, dill pickle.HIGHEST_PROTOCOL, , , . pickle.HIGHEST_PROTOCOL, , ​​ , .

with open('dill', 'wb') as fp:
    dill.dump(B, fp, protocol=pickle.HIGHEST_PROTOCOL)
with open('pickle', 'wb') as fp:
    pickle.dump(B, fp, protocol=pickle.HIGHEST_PROTOCOL)
+2

dill. dill pickle, numpy . , dill numpy .

, ( ) dill DEFAULT_PROTOCOL ( HIGHEST_PROTOCOL), python3, python2 HIGHEST_PROTOCOL.

+3

Source: https://habr.com/ru/post/1679878/


All Articles