Using the TensorFlow `` grad_loss / grad_ys` 'parameter to add gradients

Question

Using the TensorFlow `` grad_loss / grad_ys` 'parameter to add gradients

I am trying to use the grad_loss parameter in optimizer.minimize(loss, grad_loss=) to change the gradients of a network with existing gradients. I followed the comments here: Using grads_ys parameter in tf.gradients - TensorFlow

and I would like to run a toy example in which I recreate the default values of 1 for grad_ys , as indicated in the documentation.

Here is the corresponding code segment:

 grads_and_vars = optimizer.compute_gradients(loss_op) vars_with_grad = [v for g, v in grads_and_vars if g is not None] grad_loss = [] for grad,var in grads_and_vars: grad_loss.append(tf.ones_like(grad)) train_op = optimizer.minimize(loss_op, grad_loss=grad_loss)

The first part extracts gradients using compute_gradients . The last line calculates the gradients of the loss_op loss loss_op , but tries to use 1 filled vectors for gradations. As far as I understand, this should behave the same as funning minimize without the grad_loss parameter.

Unfortunately, this fails, as it expects grad_loss be a tensor (and has a dtype), not a list. Looking at gradients_impl.py , I see that the expected grad_loss function has the same dimension as loss (which in this case is a scalar).

I would appreciate any help in this simple example - how to add elements to gradients this way?

EDIT: I think the question boils down to defining grad_loss : "A Tensor holding a gradient calculated for loss ." How to create such a tensor from a set of gradients obtained using compute_gradients ?

Thanks.

+5

python tensorflow

yoki Mar 07 '18 at 15:07

source share

1 answer

npf · Answer 1 · 2018-03-11T19:40:04+0000

You can use the tf.convert_to_tensor method to convert the list of gradients to tensor , and then use tf.reduce_sum :

 train_op = optimizer.minimize(loss_op, grad_loss=tf.reduce_sum(tf.convert_to_tensor(grad_loss)))

Using the TensorFlow `` grad_loss / grad_ys` 'parameter to add gradients

More articles: