I present the final solution to this issue for posterity. The code below is an example of copy / paste that works under the most difficult conditions that these questions address (note that the other two answers are not examples for copy / passive code):
Code Purpose:
- Take a list of (large) files and split it into chunks (file names / index pairs)
- Process each fragment using a map operation (generators are not a workable solution here: https://github.com/tensorflow/tensorflow/issues/16343 )
- Derive some samples from a card operation that accepts only 1 file / chunk as input.
- Maintain naming of elements throughout the process.
Copy / passive working sample for Tensorflow 1.5 / Python 3.x
import tensorflow as tf import numpy as np files = [b'testA', b'testB', b'testC'] def mymap1(x): result_tensors = tf.py_func(func=mymap2, inp=[x], Tout=[tf.string, tf.int64]) return {'filename': result_tensors[0], 'value': result_tensors[1]} def mymap2(x): return np.array([x, x, x]), np.array([10, 20, 30]) def myflatmap(named_elements): return tf.data.Dataset.zip({ 'filename': tf.data.Dataset.from_tensor_slices(named_elements['filename']), 'value': tf.data.Dataset.from_tensor_slices(named_elements['value']) }) ds = tf.data.Dataset.from_tensor_slices(files) ds = ds.map(map_func=mymap1) ds = ds.flat_map(map_func=myflatmap) element = ds.make_one_shot_iterator().get_next() with tf.Session() as sess: for _ in range(9): print(sess.run(element))
Output:
{'filename': b'testA', 'value': 10} {'filename': b'testA', 'value': 20} {'filename': b'testA', 'value': 30} {'filename': b'testB', 'value': 10} {'filename': b'testB', 'value': 20} {'filename': b'testB', 'value': 30} {'filename': b'testC', 'value': 10} {'filename': b'testC', 'value': 20} {'filename': b'testC', 'value': 30}
source share