Why dill is much faster and more compact than pickle for multi-row arrays

Question

Why dill is much faster and more compact than pickle for multi-row arrays

I use Python 2.7 and NumPy 1.11.2, as well as the latest versions of dill (I just made pip install dill), on Ubuntu 16.04.

When storing a NumPy array using pickle, I find that the brine is very slow and stores arrays almost three times the “required” size.

For example, in the following code, pickle is about 50 times slower (1 s versus 50) and creates a 2.2 GB file instead of 800 MB.

 import numpy 
 import pickle
 import dill
 B=numpy.random.rand(10000,10000)
 with open('dill','wb') as fp:
    dill.dump(B,fp)
 with open('pickle','wb') as fp:
    pickle.dump(B,fp)

I thought dill was just a wrapper around the pickle. If this is true, is there a way I can improve pickling performance on my own? Is it generally not recommended to use a brine for NumPy arrays?

EDIT: using Python3, I get the same performance for pickleanddill

PS: numpy.save, , , , , .

+4

python numpy serialization pickle dill

Bananach 22 . '17 10:56

2

dill. dill pickle, numpy . , dill numpy .

, ( ) dill DEFAULT_PROTOCOL ( HIGHEST_PROTOCOL), python3, python2 HIGHEST_PROTOCOL.

+3

Mike McKerns 20 . '17 14:54

Gaëtan de Menten · Accepted Answer · 2017-09-20T10:15:23+0000

, ... , .

Python 2 0, - 2. Python 3 3, - 4 ( Python 3.6).

, 0 . , , Python. 2 .

, dill pickle.HIGHEST_PROTOCOL, , , . pickle.HIGHEST_PROTOCOL, , , .

with open('dill', 'wb') as fp:
    dill.dump(B, fp, protocol=pickle.HIGHEST_PROTOCOL)
with open('pickle', 'wb') as fp:
    pickle.dump(B, fp, protocol=pickle.HIGHEST_PROTOCOL)

Why dill is much faster and more compact than pickle for multi-row arrays

More articles: