How to save big (not huge) dictators in Python?

Question

How to save big (not huge) dictators in Python?

My dictionary will consist of several thousand keys, each of which has a 1000x1000 numpy array as a value. I do not need a human readable file. Small size and fast loading are essential.

At first I tried savemat , but I ran into problems. Pickle led to a huge file. I assume the same for csv. I read posts recommending using json (readable text, probably huge) or db (supposedly complex). What would you recommend for my case?

+4

python dictionary numpy scipy file-io

Framester Feb 10 '12 at 18:37

source share

5 answers

The file system itself is often an underrated data structure. You can have a dictionary, which is a map from your keys to file names, and then each file has a 1000x1000 array. Dictionary etching will be quick and easy, and then data files can only contain raw data (which can easily be downloaded by numpy).

+3

Greg hewgill Feb 10 '12 at 18:40

source share

What about numpy.savez ? It can store multiple numpy arrays, and they are binary, so they should be faster than pickle.

+2

tkf Feb 10 '12 at 18:49

source share

The Google Protobuf Specification is designed to be extremely effective at overhead. I'm not sure how fast he (s) gets serialized, but, being Google, I think it's not shabby.

0

Ivo Feb 10 '12 at 19:01

source share

You can use PyTables (http://www.pytables.org/moin) and save your data in HDF5 format.

0

Hyry Feb 11 '12 at 5:22

source share

jterrace · Accepted Answer · 2012-02-10T18:47:12+0000

If you have a dictionary where the keys are strings and the values are arrays, for example:

>>> import numpy >>> arrs = {'a': numpy.array([1,2]), 'b': numpy.array([3,4]), 'c': numpy.array([5,6])}

You can use numpy.savez to save them by keyword into a compressed file:

 >>> numpy.savez('file.npz', **arrs)

To download it back:

 >>> npzfile = numpy.load('file.npz') >>> npzfile <numpy.lib.npyio.NpzFile object at 0x1fa7610> >>> npzfile['a'] array([1, 2]) >>> npzfile['b'] array([3, 4]) >>> npzfile['c'] array([5, 6])

How to save big (not huge) dictators in Python?

More articles: