I would like to use several GPUs to train the Tensorflow model using parallelism data.
I am currently training the Tensorflow model using the following approach:
x_ = tf.placeholder(...) y_ = tf.placeholder(...) y = model(x_) loss = tf.losses.sparse_softmax_cross_entropy(labels=y_, logits=y) optimizer = tf.train.AdamOptimizer() train_op = tf.contrib.training.create_train_op(loss, optimizer) for i in epochs: for b in data: _ = sess.run(train_op, feed_dict={x_: bx, y_: by})
I would like to take advantage of several GPUs to train this model in parallelizing data. that is, I would like to split my parties in half and run each half of the package on one of my two GPUs.
cifar10_multi_gpu_train seems to be a good example of creating losses that are extracted from graphs running on multiple GPUs, but I have not found good examples of this learning style when using feed_dict and placeholder as opposed to the data loader queue.
UPDATE
It seems that https://timsainb.imtqy.com/multi-gpu-vae-gan-in-tensorflow.html can serve as a good example. They seem to pull average_gradients from cifar10_multi_gpu_train.py and create one placeholder, which is then sliced ββinto each of the GPUs. I think you also need to divide create_train_op into three steps: compute_gradients , average_gradients , and then apply_gradients .
source share