Excessive learning loss

Question

Excessive learning loss

I train with an encoder-based attention-based model with a lot size of 8. I don't suspect too much noise in the data set, however the examples are taken from several different distributions.

I see a lot of noise in the train loss curve. After averaging (.99), the trend is fine. Also, the accuracy of the model is not bad.

I would like to understand what could be the reason for this form of the loss curve

+4

deep learning machine-learning neural-network tensorboard

DavidS1992 Feb 02 '18 at 9:14

source share

3 answers

- , :

:

, , , / . , , . , .

0

janu777 02 . '18 9:54

- , -. , , . , , - .

, , . , .

Think of this image as a loss function for a model with only one parameter. We take the gradient at the point, multiply by the learning speed to design the line segment in the direction of the gradient (not shown). Then we take the x-value at the end of this line segment as our updated parameter, and finally, we calculate the loss with this new parameter setting.

If our learning speed was too high, we will overcome the minimum that indicated the gradient and, possibly, ended with a greater loss, as shown in the figure.

0

Imran Feb 02 '18 at 10:14

source share

DavidS1992 · Accepted Answer · 2018-03-07T09:42:38+0000

I myself found the answer.

, , /. , , , ( , - ). , .

, , , - . , . , .

https://www.tensorflow.org/tutorials/seq2seq

. , batch_size, "" batch_size. (batch_size * num_time_steps), , . , ( ) . , SGD 1.0, 1/num_time_steps.

, .

P.S. , , , 8, , , , .

Excessive learning loss

More articles: