I have a relatively large dataset (> 15 GB) stored in a single file as a Pandas frame. I would like to convert this data to TFRecords format and transfer it to my computed graph later. I followed this guide: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/how_tos/reading_data/convert_to_records.py .
However, this is still related to loading the entire data set into memory. Is there a way that allows you to convert large amounts of data into TFrecords directly without loading everything into memory? Are TFRecords needed in this context, or can I just read arrays from disk during training?
The alternatives use np.memmapor break the data block into smaller parts, but I was wondering if it is possible to convert the entire data set to TFrecord format.
source
share