Pickle file size when sorting arrays or numpy lists

Question

Pickle file size when sorting arrays or numpy lists

I have thousands of tuples of long (8640) integer lists. For example:

type(l1)
tuple

len(l1)
2

l1[0][:10]
[0, 31, 23, 0, 0, 0, 0, 0, 0, 0]

l1[1][:10]
[0, 0, 11, 16, 24, 0, 0, 0, 0, 0]

I am "bullying" tuples, and it seems that when tuples are made up of lists, the pickle file is lighter than with massive arrays. I am not new to python, but I am by no means an expert, and I do not know how memory is administered for different types of objects. I would expect numpy arrays to be lighter, but this is what I get when I grind various types of objects:

#elements in the tuple as a numpy array
l2 = [np.asarray(l1[i]) for i in range(len(l1))]
l2
[array([ 0, 31, 23, ...,  2,  0,  0]), array([ 0,  0, 11, ...,  1,  0,  0])]

#integers in the array are small enough to be saved in two bytes
l3 = [np.asarray(l1[i], dtype='u2') for i in range(len(l1))]
l3
[array([ 0, 31, 23, ...,  2,  0,  0], dtype=uint16),
 array([ 0,  0, 11, ...,  1,  0,  0], dtype=uint16)]

#the original tuple of lists
with open('file1.pkl','w') as f:
     pickle.dump(l1, f)

#tuple of numpy arrays
with open('file2.pkl','w') as f:
    pickle.dump(l2, f)

#tuple of numpy arrays with integers as unsigned 2 bytes
with open('file3.pkl','w') as f:
    pickle.dump(l3, f)

and when I check the file size:

 $du -h file1.pkl
  72K   file1.pkl

 $du -h file2.pkl
  540K  file2.pkl

 $du -h file3.pkl
 136K   file3.pkl

, , 1 , 3. , ( ) , . , ( pandas), .

, , , :

#list of pickle objects from pickle.dumps
tpl_pkl = [pickle.dumps(listoftuples[i]) for i in xrange(len(listoftuples))]

#existing pandas data frame. Inserting new column 
df['tuples'] = tpl_pkl

: , numpy , ?

, , .

.

+4

python arrays list numpy pickle

Javier 09 . '15 17:05

2

holdenweb · Answer 1 · 2015-09-09T17:11:36+0000

numpy , pickle . numpy.save() .

pandas, . , .

step21 · Answer 2 · 2017-06-30T17:07:38+0000

, , , , , . , , . , , , - webapp .

Pickle file size when sorting arrays or numpy lists

More articles: