I create rnn and use the sequene_length parameter to provide a list of lengths for the sequences in the batch, and all sequences in the batch are padded with the same length.
However, when doing backprop, is it possible to mask the gradients corresponding to the completed steps, so these steps will have 0 contribution to updating the weight? I already mask their respective costs, like this (where batch_weights is the vector 0 and 1, where the elements corresponding to the filling steps are 0):
loss = tf.mul(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.reshape(self._targets, [-1])), batch_weights) self._cost = cost = tf.reduce_sum(loss) / tf.to_float(tf.reduce_sum(batch_weights))
the problem is that I'm not sure, having done the above, whether the gradients from the filling steps have been reset or not.
source share