I have something that is equivalent to sparse softmax:
...
with tf.device('/gpu:0'):
indices = tf.placeholder(tf.int32, [None, dimsize])
self._W = weight_variable([self._num_nodes, input_layer_size])
self._b = bias_variable([self._num_nodes])
sampled_W = tf.transpose(tf.nn.embedding_lookup(self._W, indices), [0,2,1])
sampled_b = tf.nn.embedding_lookup(self._b, indices)
...
However, when I turn on host logging, I see several instances of the gradients hosted by the CPU, for example:
gradients/.../embedding_lookup_1_grad/Size: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] gradients/.../embedding_lookup_1_grad/Size: /job:localhost/replica:0/task:0/cpu:0
This happens no matter which optimizer I choose. Did I miss something?
source
share