Why, if I put some empty Pandas in hdf5, is hdf5 so large?

If I create an hdf5 file with pandas with the following code:

import pandas as pd store = pd.HDFStore("store.h5") for x in range(1000): store["name"+str(x)] = pd.Series() 

all series are empty, so why does store.h5 take up 1.1 GB of hard disk space?

+6
source share
1 answer

Short version: you found an error. Quoting this error on GitHub :

... it took a bit of hacking (pytables doesn't like objects with zero length)

I can reproduce this error on my machine. Just changing the code to this:

 import pandas as pd store = pd.HDFStore("store.h5") for x in range(1000): store["name"+str(x)] = pd.Series([1,2]) 

leads to the creation of a file with standard megabytes. I cannot find an open error on Github; You can try to report it.

I assume that you have already dealt with the problem in your code, but if this did not happen, most likely you should just check that the dimensions of the array are not zero before saving the object:

 toStore=pd.Series() assert not np.prod( toStore.shape )==0, 'Tried to store an empty object!' 
+2
source

Source: https://habr.com/ru/post/988529/


All Articles