How to set layer gradient directly before backpropagation?

Imagine a tiny network defined as follows, where linear is a typical helper function that defines the TensorFlow variables for the weight matrix and activation function:

final_layer = linear(linear(_input,10,tf.nn.tanh),20)

Normally this would be optimized using gradient descent to loss:

loss = tf.reduce_sum(tf.square(final_layer - _target)) train_step = tf.train.AdamOptimizer().minimmize(loss)

But suppose I get derivatives of wrt final_layer loss from an external source (e.g. tf.placeholder named _deriv). How can I use this gradient information with one of the built-in optimizers for backpropagate and updating network settings?

The workaround I am currently using is to create an artificial loss consisting of an internal product between _deriv and final_layer (since the derivatives of this wrt final_layer loss will be equal to _deriv).

loss = tf.reduce_sum(final_layer*_deriv) train_step = tf.train.AdamOptimizer().minimmize(loss)

This is very wasteful, because you need to make this unnecessary internal product and calculate its derivative for each stage of training, although I already know this information. Is there a better way?

For those who find this a strange thing, synthetic gradients need to be implemented .

+4
source share
1 answer

tf.gradients grad_ys, . . tf.gradients([final_layer], list_of_variables, grad_ys=[_deriv]) .

, , grad_ys tf.gradients. , - compute_gradients .

+2

Source: https://habr.com/ru/post/1657319/


All Articles