- , -. , , . , , - .
, , . , .

Think of this image as a loss function for a model with only one parameter. We take the gradient at the point, multiply by the learning speed to design the line segment in the direction of the gradient (not shown). Then we take the x-value at the end of this line segment as our updated parameter, and finally, we calculate the loss with this new parameter setting.
If our learning speed was too high, we will overcome the minimum that indicated the gradient and, possibly, ended with a greater loss, as shown in the figure.
Imran source
share