I have thousands of tuples of long (8640) integer lists. For example:
type(l1)
tuple
len(l1)
2
l1[0][:10]
[0, 31, 23, 0, 0, 0, 0, 0, 0, 0]
l1[1][:10]
[0, 0, 11, 16, 24, 0, 0, 0, 0, 0]
I am "bullying" tuples, and it seems that when tuples are made up of lists, the pickle file is lighter than with massive arrays. I am not new to python, but I am by no means an expert, and I do not know how memory is administered for different types of objects. I would expect numpy arrays to be lighter, but this is what I get when I grind various types of objects:
l2 = [np.asarray(l1[i]) for i in range(len(l1))]
l2
[array([ 0, 31, 23, ..., 2, 0, 0]), array([ 0, 0, 11, ..., 1, 0, 0])]
l3 = [np.asarray(l1[i], dtype='u2') for i in range(len(l1))]
l3
[array([ 0, 31, 23, ..., 2, 0, 0], dtype=uint16),
array([ 0, 0, 11, ..., 1, 0, 0], dtype=uint16)]
with open('file1.pkl','w') as f:
pickle.dump(l1, f)
with open('file2.pkl','w') as f:
pickle.dump(l2, f)
with open('file3.pkl','w') as f:
pickle.dump(l3, f)
and when I check the file size:
$du -h file1.pkl
72K file1.pkl
$du -h file2.pkl
540K file2.pkl
$du -h file3.pkl
136K file3.pkl
, , 1 , 3. , ( ) , . , ( pandas), .
, , , :
tpl_pkl = [pickle.dumps(listoftuples[i]) for i in xrange(len(listoftuples))]
df['tuples'] = tpl_pkl
: , numpy , ?
, , .
.