Variable length rnn and mask fill gradients

I create rnn and use the sequene_length parameter to provide a list of lengths for the sequences in the batch, and all sequences in the batch are padded with the same length.

However, when doing backprop, is it possible to mask the gradients corresponding to the completed steps, so these steps will have 0 contribution to updating the weight? I already mask their respective costs, like this (where batch_weights is the vector 0 and 1, where the elements corresponding to the filling steps are 0):

loss = tf.mul(tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.reshape(self._targets, [-1])), batch_weights) self._cost = cost = tf.reduce_sum(loss) / tf.to_float(tf.reduce_sum(batch_weights)) 

the problem is that I'm not sure, having done the above, whether the gradients from the filling steps have been reset or not.

+5
source share
1 answer

For all reverse / reverse (immediate) operations, masking losses / costs is sufficient.

For all sequences / recursive operations (e.g. dynamic_rnn ), there always exists a sequence_length parameter, which must be set to the appropriate sequence lengths. Then there will be no gradient for steps with zero addition or, in other words, it will have 0 contribution.

+2
source

Source: https://habr.com/ru/post/1244248/


All Articles