TensorFlow How do new contrib.data.Dataset objects work?

Question

TensorFlow How do new contrib.data.Dataset objects work?

In TensorFlow, the old input pipeline used a series of queues and queue flows, and removed objects from these queues. For example, the string_input_producer for file names, tf.train.batch as a queue for batch processing, etc.

Therefore, before training you had to write:

 coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord)

To create and run threads that populate all of these queues.

I updated my data entry pipeline from this old model to use the new one currently located in tf.contrib.data.TFRecordDataset to read the TFRecord files that I use for training.

I noticed that I can remove:

 coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord)

and the input conveyor is still running smoothly.

So my question is this:

How does the new inlet piping work under the hood? Does he even use queues? Or does he use them, and just start them yourself? In addition, if he uses them, is there a way to control how complete they are, since the old conveyor did this automatically and the new one did not?

+6

tensorflow training-data

John scolaro Jul 12 '17 at 2:16

source share

1 answer

Eleanor quint · Accepted Answer · 2019-08-16T16:51:50+0000

tl; dr Queues are no longer used since they are now integrated into the TF schedule. Iterator management takes place deep in code.

The standard method to get the data tensor from tf.data.Dataset is to call next(dataset) . to get the tensor to use as an input to the first layer of the network. Under the hood, an object called IteratorV2 is being built [1]. Then, some indirectness takes a call to IteratorV2._next_internal [2], where it branches. If not executed impatiently, it calls gen_dataset_ops.iterator_get_next , otherwise it calls gen_dataset_ops.iterator_get_next_sync . This is a file generated at build time, so we don’t have it on GitHub, but in my compilation it usually calls _pywrap_tensorflow.TFE_Py_FastPathExecute , which creates a node in the TF graph using "A Tensor type resource ".

I can’t find a way to keep track of what's happening under the hood. IteratorV2 has no methods for this, and tf.data.Dataset too high for this.

References:

TensorFlow How do new contrib.data.Dataset objects work?

More articles: