Excessive learning loss

I train with an encoder-based attention-based model with a lot size of 8. I don't suspect too much noise in the data set, however the examples are taken from several different distributions.

I see a lot of noise in the train loss curve. After averaging (.99), the trend is fine. Also, the accuracy of the model is not bad.

I would like to understand what could be the reason for this form of the loss curve

noisy train loss average train losses

+4
source share
3 answers

I myself found the answer.

, , /. , , , ( , - ). , .

, , , - . , . , .

https://www.tensorflow.org/tutorials/seq2seq

. , batch_size, "" batch_size. (batch_size * num_time_steps), , . , ( ) . , SGD 1.0, 1/num_time_steps.

, .

P.S. , , , 8, , , , .

0

- , :

:

, , , / . , , . , .

curve

0

- , -. , , . , , - .

, , . , .

enter image description here

Think of this image as a loss function for a model with only one parameter. We take the gradient at the point, multiply by the learning speed to design the line segment in the direction of the gradient (not shown). Then we take the x-value at the end of this line segment as our updated parameter, and finally, we calculate the loss with this new parameter setting.

If our learning speed was too high, we will overcome the minimum that indicated the gradient and, possibly, ended with a greater loss, as shown in the figure.

0
source

Source: https://habr.com/ru/post/1693068/


All Articles