Reasons for Nan Deep Learning

This may be a too general question, but can someone explain what can cause a super-precision neural network?

Features:

I use the tisorflow iris_training model with some of my own data and keep getting

ERROR: tensor flow: model diverges from loss = NaN.

Traceback ...

tensorflow.contrib.learn.python.learn.monitors.NanLossDuringTrainingError: NaN loss during training.

Tracing occurred using the line:

tf.contrib.learn.DNNClassifier(feature_columns=feature_columns, hidden_units=[300, 300, 300], #optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=0.001, l1_regularization_strength=0.00001), n_classes=11, model_dir="/tmp/iris_model") 

I tried setting the optimizer using zero for learning speed and not using the optimizer. Any understanding of network layers, data size, etc. It is estimated.

+45
source share
7 answers

I have seen many things that make the model diverge.

  1. Too fast learning. It can often be said whether this is so if the loss begins to increase, and then diverges ad infinitum.

  2. I am not familiar with DNNClassifier, but I assume that it uses the categorical cross-entropy value function. This includes maintaining a forecast log, which diverges when the forecast approaches zero. This is why people usually add a small epsilon value to the forecast to prevent this discrepancy. I believe that DNNClassifier probably does this or uses the flow tensor for it. Probably not a problem.

  3. There may be other problems with number stability, such as dividing by zero, when adding epsilon can help. Another less obvious if the square root, the derivative of which may diverge, if not properly simplified when working with numbers of finite accuracy. Once again, I doubt this is a problem in the case of DNNClassifier.

  4. You may have problems with the input. Try calling assert not np.any(np.isnan(x)) for input to make sure you don't enter nan. Also make sure all target values ​​are valid. Finally, make sure the data is normalized properly. Perhaps you want the pixels to be in the range [-1, 1], and not [0, 255].

  5. Labels must be in the range of the loss function, so when using the logarithmic loss function, all labels must be non-negative (as noted by evan pu and the comments below).

+71
source

If you are training for cross entropy, you want to add a small amount, such as 1e-8, to your probability of exiting.

Since log (0) is negative infinity, when your model is sufficiently prepared, the distribution of the output will be greatly distorted, for example, I draw a class 4 conclusion, in the beginning my probability looks like

 0.25 0.25 0.25 0.25 

but by the end the probability will probably look like

 1.0 0 0 0 

And you take the cross-entropy of this distribution, everything will explode. The fix is ​​to artificially add a small number to all terms to prevent this.

+8
source

If using integers as targets, make sure they are not symmetrical at 0.

Ie, don't use classes -1, 0, 1. Use 0, 1, 2 instead.

+3
source

In my case, I got NAN when setting remote integer labels. i.e:

  • Shortcuts [0..100] the training went fine,
  • Labels [0..100] plus one additional label 8000, then I got NAN.

Therefore, do not use a very distant shortcut.

EDIT You can see the effect in the following simple code:

 from keras.models import Sequential from keras.layers import Dense, Activation import numpy as np X=np.random.random(size=(20,5)) y=np.random.randint(0,high=5, size=(20,1)) model = Sequential([ Dense(10, input_dim=X.shape[1]), Activation('relu'), Dense(5), Activation('softmax') ]) model.compile(optimizer = "Adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"] ) print('fit model with labels in range 0..5') history = model.fit(X, y, epochs= 5 ) X = np.vstack( (X, np.random.random(size=(1,5)))) y = np.vstack( ( y, [[8000]])) print('fit model with labels in range 0..5 plus 8000') history = model.fit(X, y, epochs= 5 ) 

The result shows the NAN after adding the 8000 label:

 fit model with labels in range 0..5 Epoch 1/5 20/20 [==============================] - 0s 25ms/step - loss: 1.8345 - acc: 0.1500 Epoch 2/5 20/20 [==============================] - 0s 150us/step - loss: 1.8312 - acc: 0.1500 Epoch 3/5 20/20 [==============================] - 0s 151us/step - loss: 1.8273 - acc: 0.1500 Epoch 4/5 20/20 [==============================] - 0s 198us/step - loss: 1.8233 - acc: 0.1500 Epoch 5/5 20/20 [==============================] - 0s 151us/step - loss: 1.8192 - acc: 0.1500 fit model with labels in range 0..5 plus 8000 Epoch 1/5 21/21 [==============================] - 0s 142us/step - loss: nan - acc: 0.1429 Epoch 2/5 21/21 [==============================] - 0s 238us/step - loss: nan - acc: 0.2381 Epoch 3/5 21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381 Epoch 4/5 21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381 Epoch 5/5 21/21 [==============================] - 0s 188us/step - loss: nan - acc: 0.2381 
+3
source

If you want to collect more information about the error and if the error occurs in the first few iterations, I suggest you conduct an experiment in the processor-only mode (without GPUs). The error message will be much more specific.

Source: https://github.com/tensorflow/tensor2tensor/issues/574

+1
source

Labels must be between 0 and the number of classes.

0
source

Regularization can help. For the classifier there is a good example of regularization of activity, whether it is a binary or multiclass classifier. For a regressor, regularization of the nucleus may be more appropriate.

0
source

Source: https://habr.com/ru/post/1258556/


All Articles