Processing very large datasets in Tensorflow

Question

Processing very large datasets in Tensorflow

I have a relatively large dataset (> 15 GB) stored in a single file as a Pandas frame. I would like to convert this data to TFRecords format and transfer it to my computed graph later. I followed this guide: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/how_tos/reading_data/convert_to_records.py .

However, this is still related to loading the entire data set into memory. Is there a way that allows you to convert large amounts of data into TFrecords directly without loading everything into memory? Are TFRecords needed in this context, or can I just read arrays from disk during training?

The alternatives use np.memmapor break the data block into smaller parts, but I was wondering if it is possible to convert the entire data set to TFrecord format.

+4

machine-learning tensorflow

Eweler Apr 26 '17 at 14:54

source share

No one has answered this question yet.

See related questions:

501

Tensorflow: how to save / restore a model?

3

reading a large dataset in a tensor stream

3

How to change data to None in a tensor stream?