Tensor Flow Inlet Pipeline for Distributed Learning

I am trying to figure out how to set up my input pipeline for tensor flow in a distributed workout. It is not clear whether readers will read from one process and send data to all employees, or will each server launch its own input pipeline? How do we guarantee that each employee has a different entrance for this?

+4
source share
1 answer

I will give an example of how I do this:

import tensorflow as tf
batch_size = 50
task_index = 2
num_workers = 10
input_pattern = "gs://backet/dir/part-00*"

get all file names in bucket matching input_pattern

files_names = tf.train.match_filenames_once(
                input_pattern, name = "myFiles")

select names for the worker task_index. tf.strided_slicesimilar to slice for lists: a [::, task_index] (select each task_indexfile for work task_index)

to_process = tf.strided_slice(files_names, [task_index],
                 [999999999], strides=[num_workers])
filename_queue = tf.train.string_input_producer(to_process,
                     shuffle=True, #shufle files
                     num_epochs=num_epochs)

reader = tf.TextLineReader()
_ , value = reader.read(filename_queue)
col1,col2 = tf.decode_csv(value,
        record_defaults=[[1],[1]], field_delim="\t")

train_inputs, train_labels = tf.train.shuffle_batch([col1,[col2]],
        batch_size=batch_size,
        capacity=50*batch_size,
        num_threads=10,
        min_after_dequeue = 10*batch_size,
        allow_smaller_final_batch = True)

loss = f(...,train_inputs, train_labels)
optimizer = ...

with tf.train.MonitoredTrainingSession(...) as mon_sess:
    coord = tf.train.Coordinator()
    with coord.stop_on_exception():
        _ = tf.train.start_queue_runners(sess = mon_sess, coord=coord)
        while not coord.should_stop() and not mon_sess.should_stop():
            optimizer.run()

, - TensorFlow,


TensorFlow: http://web.stanford.edu/class/cs20si/lectures/notes_09.pdf

0

Source: https://habr.com/ru/post/1682145/


All Articles