Sunreef's answer is absolutely correct. You may still wonder why we introduced Dataset.apply() , and I thought I was suggesting some background.
The tf.data API has a set of basic transformations - for example, Dataset.map() and Dataset.filter() - which are usually useful in a wide range of data sets, are unlikely to change and will be implemented as methods of the tf.data.Dataset object. In particular, they are subject to the same backward compatibility guarantees as other core TensorFlow APIs.
However, the basic approach is a bit limiting. We also want the freedom to experiment with new transformations before adding them to the kernel, and let other library developers create their own reusable transformations. Therefore, in TensorFlow 1.4, we divided the set of user transformations that live in tf.contrib.data . Custom conversions include those that have very specific functionality (e.g. tf.contrib.data.sloppy_interleave() ), and some where the API is still in the stream (e.g. tf.contrib.data.group_by_window() ). Initially, we implemented these custom transformations as functions from Dataset to Dataset , which adversely affected the syntax flow of the pipeline. For instance:
dataset = tf.data.TFRecordDataset(...).map(...)
Since this was apparently a common template, we added Dataset.apply() as a way to bind the main and user transformations in a single pipeline:
dataset = (tf.data.TFRecordDataset(...) .map(...) .apply(custom_transformation(x, y, z)) .shuffle(...) .repeat(...) .batch(...))
This is a minor feature in the grandiose scheme of things, but I hope this makes it easier to read tf.data programs, and the library is easier to expand.
source share