Using compression with Pandas and HD5 / HDFStore

For several aspects of the project, using the h5 repository would be ideal. However, the files become massive, and frankly, we are running out of space.

This statement...

store.put(storekey, data, table=False, compression='gzip') 

makes no difference in file size than ...

  store.put(storekey, data, table=False) 

Is compression used even when going through Pandas?

... if this is not possible, I am not opposed to using h5py, however I am not sure what to put for the "data type", since the DataFrame contains all kinds of types (strings, float, int, etc ..)

Any help / understanding would be appreciated!

+6
source share
1 answer

see docs regarding compression using HDFStore

gzip not a valid compression parameter (and the error is ignored). try any of zlib, bzip2, lzo, blosc (additional libraries may be required for bzip2 / lzo)

see PyTables docs for various compression

Heres a question semi-connected.

+8
source

Source: https://habr.com/ru/post/951885/


All Articles