Implementation of parallelism model in tensor flow

I start with the tensor flow. I am currently working on a system with 2 GPUs, each of which is 12 GB. I want to implement a parallelism model on two GPUs to prepare large models. I looked through all the information on the Internet, SO, tensor flow documentation, etc., I could find explanations for the parallelism model and its results, but I did not find a small tutorial or small code snippets on how to implement it using tensorflow. I mean, should we exchange activations right after each layer? So how do we do this? Are there concrete or cleaner ways to implement the parallelism model in a tensor flow? It would be very helpful if you would offer me a place where I can learn how to implement it, or simple code, for example,mnist training on multiple GPUs using "MODEL PARALLELISM".

Note. I made parallelism data, as in CIFAR10 is a multi-user gpu tutorial, but I did not find any implementation of the parallelism model.

+4
source share
1 answer

Here is an example. The model has some parts on GPU0, some parts on GPU1 and some components on the processor, so this is a three-way parallelism model.

with tf.device("/gpu:0"):
    a = tf.Variable(tf.ones(()))
    a = tf.square(a)
with tf.device("/gpu:1"):
    b = tf.Variable(tf.ones(()))
    b = tf.square(b)
with tf.device("/cpu:0"):
    loss = a+b
opt = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train_op = opt.minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(10):
    loss0, _ = sess.run([loss, train_op])
    print("loss", loss0)
+7
source

Source: https://habr.com/ru/post/1668933/


All Articles