TensorFlow ReluGrad claims input is not final

I am testing TensorFlow and I got confused by a weird bug. I edited the deep MNIST example to use a different set of images, and the algorithm converges well again until around iteration 8000 (91% accuracy at this point) when it crashes with the next error.

tensorflow.python.framework.errors.InvalidArgumentError: ReluGrad input is not finite 

At first I thought that some coefficients reached the limit for float, but adding l2-regularization for all weights and offsets did not help to solve the problem. This is always the first relu application to exit stacktrace:

 h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) 

I only work with the processor. Any idea what might cause this and how to get around it?

Edit: I traced this issue to Tensorflow NaN error? The solution works there.

+6
tensorflow gradient-descent
Nov 13 '15 at 18:07
source share
4 answers

Since I had another topic on this issue [ Tensorflow NaN error? ] I did not update this news, but the solution was a while and since then echoed posters here. The problem really is that 0 * log (0) leads to NaN.

One option is to use the Muaaz line suggested here, or the one I wrote in a related topic. But in the end, TensorFlow uses this routine: tf.nn.softmax_cross_entropy_with_logits , and it is more efficient and therefore should be preferred when possible. This should be used where possible, instead of what was suggested by me and Muaaz earlier, as the commentator pointed out at the specified link.

+3
Jan 18 '16 at 15:31
source share

Error caused by 0log (0)

This can be avoided:

 cross_entropy = -tf.reduce_sum(y*tf.log(yconv+ 1e-9)) 
+9
Dec 18 '15 at 18:44
source share

I experienced this error: input is not finite before (not with tf.nn.relu ). In my case, the problem was that the elements in my tensor variable reached a very large number (which designated them as infinite and therefore the message input is not finite ).

I would suggest adding a bunch of debug output to tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) at every nth iteration to keep track of when it reached infinity.

This is consistent with your comment:

I change the value to 1e-3, the accident occurs much earlier. However, changing it to 1e-5 does not allow the algorithm to converge

+1
Nov 13 '15 at 21:46
source share

I can not comment because of reputation, but Muaaz has an answer. The error can be replicated by training the system with 0 error - the result is log (0). His solution works to prevent this. As an alternative, take advantage of the error and move on.

 ...your other code... try : for i in range(10000): train_accuracy = accuracy.eval(feed_dict={ x:batch_xs, y_: batch_ys, keep_prob: 1.0}) except : print("training interupted. Hopefully deliberately") 
+1
Jan 16 '16 at 16:48
source share



All Articles