Imagine a tiny network defined as follows, where linear is a typical helper function that defines the TensorFlow variables for the weight matrix and activation function:
final_layer = linear(linear(_input,10,tf.nn.tanh),20)
Normally this would be optimized using gradient descent to loss:
loss = tf.reduce_sum(tf.square(final_layer - _target))
train_step = tf.train.AdamOptimizer().minimmize(loss)
But suppose I get derivatives of wrt final_layer loss from an external source (e.g. tf.placeholder named _deriv). How can I use this gradient information with one of the built-in optimizers for backpropagate and updating network settings?
The workaround I am currently using is to create an artificial loss consisting of an internal product between _deriv and final_layer (since the derivatives of this wrt final_layer loss will be equal to _deriv).
loss = tf.reduce_sum(final_layer*_deriv)
train_step = tf.train.AdamOptimizer().minimmize(loss)
This is very wasteful, because you need to make this unnecessary internal product and calculate its derivative for each stage of training, although I already know this information. Is there a better way?
For those who find this a strange thing, synthetic gradients need to be implemented .
source
share