Hdf5 for pandas dataframe

I uploaded a dataset that is stored in .h5 files. I need to store only certain columns and be able to manipulate the data in it.

To do this, I tried loading it into the pandas framework. I tried to use:

pd.read_hdf(path)

But I get: No dataset in HDF5 file.

I found answers to SO ( read HDF5 file on pandas DataFrame with conditions ), but I do not need conditions, and the answer adds conditions as the file was written, but I am not the creator of the file, so I can not do anything about it.

I also tried using h5py:

df = h5py.File(path)

But this is not easy to manipulate, and I cannot get columns from it (only column names with df.keys()) Any idea on how to do this?

+8
3
+4

HDF ...

, , HDF:

In [4]: fn = r'D:\temp\.data\test.h5'

In [5]: store = pd.HDFStore(fn)

In [6]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\test.h5
/test            frame_table  (typ->appendable,nrows->7,ncols->4,indexers->[index],dc->[Col1,Col2,Col3,Col4])

In [7]: df = store.select('test')

In [8]: df
Out[8]:
        Col1      Col2  Col3  Col4
0       what       the     0     0
1        are    curves     1     8
2        men        of     2    16
3         to      your     3    24
4      rocks      lips     4    32
5        and   rewrite     5    40
6  mountains  history.     6    48
0

Pandas - h5py, np.array, DataFrame. :

df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))
0

Source: https://habr.com/ru/post/1660115/


All Articles