Why data scaling is very important in a neural network (LSTM)

Question

Why data scaling is very important in a neural network (LSTM)

I am writing my main thesis on how to use the LSTM neural network in time series. In my experiment, I found out that scaling data can have a big impact on the result. For example, when I use the tanh activation function, and the range of values is between -1 and 1, the model seems to converge faster, and the validation error also does not affect sharply after each era.

Does anyone know if there is a mathematical explanation? Or are there any documents that already explain this situation?

+4

neural-network lstm backpropagation

Thanh quang Oct 11 '17 at 11:19

source share

2 answers