How to save big (not huge) dictators in Python?

My dictionary will consist of several thousand keys, each of which has a 1000x1000 numpy array as a value. I do not need a human readable file. Small size and fast loading are essential.

At first I tried savemat , but I ran into problems. Pickle led to a huge file. I assume the same for csv. I read posts recommending using json (readable text, probably huge) or db (supposedly complex). What would you recommend for my case?

+4
source share
5 answers

If you have a dictionary where the keys are strings and the values ​​are arrays, for example:

>>> import numpy >>> arrs = {'a': numpy.array([1,2]), 'b': numpy.array([3,4]), 'c': numpy.array([5,6])} 

You can use numpy.savez to save them by keyword into a compressed file:

 >>> numpy.savez('file.npz', **arrs) 

To download it back:

 >>> npzfile = numpy.load('file.npz') >>> npzfile <numpy.lib.npyio.NpzFile object at 0x1fa7610> >>> npzfile['a'] array([1, 2]) >>> npzfile['b'] array([3, 4]) >>> npzfile['c'] array([5, 6]) 
+6
source

The file system itself is often an underrated data structure. You can have a dictionary, which is a map from your keys to file names, and then each file has a 1000x1000 array. Dictionary etching will be quick and easy, and then data files can only contain raw data (which can easily be downloaded by numpy).

+3
source

What about numpy.savez ? It can store multiple numpy arrays, and they are binary, so they should be faster than pickle.

+2
source

The Google Protobuf Specification is designed to be extremely effective at overhead. I'm not sure how fast he (s) gets serialized, but, being Google, I think it's not shabby.

0
source

You can use PyTables (http://www.pytables.org/moin) and save your data in HDF5 format.

0
source

Source: https://habr.com/ru/post/1395821/


All Articles