Loss of tensor flow going to NaN

I use the tf.nn.sigmoid_cross_entropy_with_logits function for loss, and it goes to NaN.

I already use gradient cropping, one place where tensor division is performed, I added epsilon to prevent division by zero, and arguments for all softmax functions are also added to epsilon.

However, I get NaN in the middle of training.

Are there any known issues in which tensor flow does this that I missed? This is rather unpleasant because the loss randomly goes to NaN during training and destroys everything.

In addition, how can I determine if a train step will lead to NaN and possibly skip this example altogether? Any suggestions?

EDIT: The Net is a Turing Neural Machine.

EDIT 2: I downloaded part of the code here . It is not commented on and makes sense to those who have already read the NTM article from Graves et al. available here: https://arxiv.org/abs/1410.5401

I'm not sure that all of my code follows exactly as the authors of the article suggested. I just do it as a practice, and I have no mentors to correct me.

EDIT 3: Here is the code for clipping the gradient:

optimizer = tf.train.AdamOptimizer(self.lr)
gvs = optimizer.compute_gradients(loss)
capped_gvs =\
  [(tf.clip_by_value(grad, -1.0, 1.0), var) if grad != None else (grad, var) for grad, var in gvs]
train_step = optimizer.apply_gradients(capped_gvs)

I had to add a condition if grad != Nonebecause I got an error without it. Could the problem be here?

Potential solution: I have been using tf.contrib.losses.sigmoid_cross_entropy for a while, and so far the loss has not diverged. Will check a few more and report back.

+4
1

1e-4 . , . , . , , , , , . , , .

0

Source: https://habr.com/ru/post/1661967/


All Articles