I am looking for an opportunity to add data to an existing dataset in a .h5 file using Python ( h5py ).
A brief introduction to my project: I'm trying to train CNN using medical image data. Due to the huge amount of data and heavy memory usage when converting data to NumPy arrays, I needed to divide the "conversion" into several pieces of data: load and pre-process the first 100 medical images and save the NumPy arrays in hdf5. file, then download the next 100 data sets and add the existing .h5 file, etc.
Now I tried to save the first 100 converted NumPy arrays as follows:
import h5py from LoadIPV import LoadIPV X_train_data, Y_train_data, X_test_data, Y_test_data = LoadIPV() with h5py.File('.\PreprocessedData.h5', 'w') as hf: hf.create_dataset("X_train", data=X_train_data, maxshape=(None, 512, 512, 9)) hf.create_dataset("X_test", data=X_test_data, maxshape=(None, 512, 512, 9)) hf.create_dataset("Y_train", data=Y_train_data, maxshape=(None, 512, 512, 1)) hf.create_dataset("Y_test", data=Y_test_data, maxshape=(None, 512, 512, 1))
As you can see, the converted NumPy arrays are divided into four different “groups”, which are stored in four hdf5 data hdf5 [X_train, X_test, Y_train, Y_test] . The LoadIPV() function preprocesses the medical image data.
My problem is that I would like to save the following 100 NumPy arrays in the same .h5 file in existing datasets: this means that I would like to add, for example, an existing X_train form [100, 512, 512, 9] with the next 100 NumPy arrays, so X_train takes on the form [200, 512, 512, 9] . The same should work for the other three datasets X_test , Y_train and Y_test .