Different workers see different pieces of data as a result of the removal of mini-batch images from a single queue of pre-processed images. To develop in a distributed setup for training the Imagenet model, input images are pre-processed by several threads, and pre-processed images are stored in one RandomShuffleQueue. You can find tf.RandomShuffleQueuein this file , to see how it's done. Several workers are organized as “Initial Towers,” and each tower cancels a mini-batch of images from a single queue and thus receives different input parts. The image here answers the second part of your question. Locate slim.variables.VariableDeviceChooserin this file . The logic there ensures that objectsVariableare assigned evenly to workers who act as a parameter server. All other employees performing actual training select the variables at the beginning of the step and update them at the end of the step.
source
share