Using the Dataset pipeline for tensor flow, what will I * call * the results of the `map` operation?

I have a display function below (runnable example) that inputs string and outputs string and integer .

in tf.data.Dataset.from_tensor_slices I called the source input 'filenames' . But when I return the values ​​from the map map_element_counts function, I can only return the tuple (returning the dictionary throws an exception).

Can I name two elements returned from my map_element_counts function?

 import tensorflow as tf filelist = ['fileA_6', 'fileB_10', 'fileC_7'] def map_element_counts(fname): # perform operations outside of tensorflow return 'test', 10 ds = tf.data.Dataset.from_tensor_slices({'filenames': filelist}) ds = ds.map(map_func=lambda x: tf.py_func( func=map_element_counts, inp=[x['filenames']], Tout=[tf.string, tf.int64] )) element = ds.make_one_shot_iterator().get_next() with tf.Session() as sess: print(sess.run(element)) 

Result:

 (b'test', 10) 

Desired Result:

 {'elementA': b'test', 'elementB': 10) 

Detailed information added:

When I do return {'elementA': 'test', 'elementB': 10} , I get this exception:

 tensorflow.python.framework.errors_impl.UnimplementedError: Unsupported object type dict 
+5
source share
3 answers

tf.py_func works inside ds.map .

I created a very simple file as an example. Where I just write 10 inside.

dummy_file.txt:

 10 

Here for the script:

 import tensorflow as tf filelist = ['dummy_file.txt', 'dummy_file.txt', 'dummy_file.txt'] def py_func(input): # perform operations outside of tensorflow parsed_txt_file = int(input) return 'test', parsed_txt_file def map_element_counts(fname): # let tensorflow read the text file file_string = tf.read_file(fname['filenames']) # then use python function on the extracted string a, b = tf.py_func( func=py_func, inp=[file_string], Tout=[tf.string, tf.int64] ) return {'elementA': a, 'elementB': b, 'file': fname['filenames']} ds = tf.data.Dataset.from_tensor_slices({'filenames': filelist}) ds = ds.map(map_element_counts) element = ds.make_one_shot_iterator().get_next() with tf.Session() as sess: print(sess.run(element)) print(sess.run(element)) print(sess.run(element)) 

Output:

 {'file': b'dummy_file.txt', 'elementA': b'test', 'elementB': 10} {'file': b'dummy_file.txt', 'elementA': b'test', 'elementB': 10} {'file': b'dummy_file.txt', 'elementA': b'test', 'elementB': 10} 
+1
source

I present the final solution to this issue for posterity. The code below is an example of copy / paste that works under the most difficult conditions that these questions address (note that the other two answers are not examples for copy / passive code):

Code Purpose:

  • Take a list of (large) files and split it into chunks (file names / index pairs)
  • Process each fragment using a map operation (generators are not a workable solution here: https://github.com/tensorflow/tensorflow/issues/16343 )
  • Derive some samples from a card operation that accepts only 1 file / chunk as input.
  • Maintain naming of elements throughout the process.

Copy / passive working sample for Tensorflow 1.5 / Python 3.x

 import tensorflow as tf import numpy as np files = [b'testA', b'testB', b'testC'] def mymap1(x): result_tensors = tf.py_func(func=mymap2, inp=[x], Tout=[tf.string, tf.int64]) return {'filename': result_tensors[0], 'value': result_tensors[1]} def mymap2(x): return np.array([x, x, x]), np.array([10, 20, 30]) def myflatmap(named_elements): return tf.data.Dataset.zip({ 'filename': tf.data.Dataset.from_tensor_slices(named_elements['filename']), 'value': tf.data.Dataset.from_tensor_slices(named_elements['value']) }) ds = tf.data.Dataset.from_tensor_slices(files) ds = ds.map(map_func=mymap1) ds = ds.flat_map(map_func=myflatmap) element = ds.make_one_shot_iterator().get_next() with tf.Session() as sess: for _ in range(9): print(sess.run(element)) 

Output:

 {'filename': b'testA', 'value': 10} {'filename': b'testA', 'value': 20} {'filename': b'testA', 'value': 30} {'filename': b'testB', 'value': 10} {'filename': b'testB', 'value': 20} {'filename': b'testB', 'value': 30} {'filename': b'testC', 'value': 10} {'filename': b'testC', 'value': 20} {'filename': b'testC', 'value': 30} 
+2
source

In this case, tf.py_func is not necessary, because map_func Dataset#map works with dictionaries and other structures:

map_func : a function that maps the nested structure of tensors (having forms and types defined by self.output_shapes and self.output_types ) to another nested structure of tensors.

Here is an example:

 import tensorflow as tf filelist = ['fileA_6', 'fileB_10', 'fileC_7'] def map_element_counts(fnames): return {'elementA': b'test', 'elementB': 10, 'file': fnames['filenames']} ds = tf.data.Dataset.from_tensor_slices({'filenames': filelist}) ds = ds.map(map_func=map_element_counts) element = ds.make_one_shot_iterator().get_next() with tf.Session() as sess: print(sess.run(element)) print(sess.run(element)) print(sess.run(element)) 

Output:

 {'elementA': 'test', 'elementB': 10, 'file': 'fileA_6'} {'elementA': 'test', 'elementB': 10, 'file': 'fileB_10'} {'elementA': 'test', 'elementB': 10, 'file': 'fileC_7'} 
+1
source

Source: https://habr.com/ru/post/1274978/


All Articles