How to use tf.data.Dataset.apply () to modify a dataset

Question

How to use tf.data.Dataset.apply () to modify a dataset

I work with time series models in tensor flow. My dataset contains physical signals. I need to split these signals into windows so that these cut windows are introduced into my model.

This is how I read the data and cut it:

import tensorflow as tf import numpy as np def _ds_slicer(data): win_len = 768 return {"mix":(tf.stack(tf.split(data["mix"],win_len))), "pure":(tf.stack(tf.split(data["pure"],win_len)))} dataset = tf.data.Dataset.from_tensor_slices({ "mix" : np.random.uniform(0,1,[1000,24576]), "pure" : np.random.uniform(0,1,[1000,24576]) }) dataset = dataset.map(_ds_slicer) print dataset.output_shapes # {'mix': TensorShape([Dimension(768), Dimension(32)]), 'pure': TensorShape([Dimension(768), Dimension(32)])}

I want to change this dataset to # {'mix': TensorShape([Dimension(32)]), 'pure': TensorShape([Dimension(32))}

An equivalent conversion to numpy would look like this:

 signal = np.random.uniform(0,1,[1000,24576]) sliced_sig = np.stack(np.split(signal,768,axis=1),axis=1) print sliced_sig.shape #(1000, 768, 32) sliced_sig=sliced_sig.reshape(-1, sliced_sig.shape[-1]) print sliced_sig.shape #(768000, 32)

I was thinking about using tf.contrib.data.group_by_window as the dataset.apply () input, but I couldn’t determine exactly how to use it. Is there a way I can use any custom transform to modify a dataset?

+6

python tensorflow tensorflow-datasets

Nilay thakor Jan 24 '18 at 5:42

source share

1 answer

Olivier moindrot · Accepted Answer · 2018-01-27T03:53:01+0000

I think you are just looking for the tf.contrib.data.unbatch conversion. This is exactly what you want:

 x = np.zeros((1000, 768, 32)) dataset = tf.data.Dataset.from_tensor_slices(x) print(dataset.output_shapes) # (768, 32) dataset = dataset.apply(tf.contrib.data.unbatch()) print(dataset.output_shapes) # (32,)

From the documentation:

If the elements of the data set are of the form [B, a0, a1, ...], where B can vary from element to element, then for each element in the data set, the packet data set will contain B consecutive elements of the form [a0, a1,. ..].

Edit for TF 2.0

(Thanks @DavidParks)

With TF 2.0 you can use tf.data.Dataset.unbatch directly:

 x = np.zeros((1000, 768, 32)) dataset = tf.data.Dataset.from_tensor_slices(x) print(dataset.output_shapes) # (768, 32) dataset = dataset.unbatch() print(dataset.output_shapes) # (32,)

How to use tf.data.Dataset.apply () to modify a dataset

Edit for TF 2.0

More articles: