I am currently working with the tensor in multi-gpu setup, training the model on several GPUs, using several towers, as is done in the part with several gpu https://www.tensorflow.org/tutorials/deep_cnn .
All model weights are shared between all the towers and for this purpose are placed in the processor memory. For this, the specific location of the device is used everywhere in the code where the variable is created or reused (with tf.get_variable).
I was looking for a way to place all the variables on the processor in a more convenient way and stepped on the caching_device argument in variable_scope and was wondering if this is what I'm looking for, but I'm still not sure, since the corresponding weights on the graph have ops placed on gpu corresponding to the location of the device used during creation, plus the read operation on the processor.
Do you have information on the specific use of caching_device, what actually happens, where the variable is actually located?
Thank.
source
share