Why data scaling is very important in a neural network (LSTM)

I am writing my main thesis on how to use the LSTM neural network in time series. In my experiment, I found out that scaling data can have a big impact on the result. For example, when I use the tanh activation function, and the range of values ​​is between -1 and 1, the model seems to converge faster, and the validation error also does not affect sharply after each era.

Does anyone know if there is a mathematical explanation? Or are there any documents that already explain this situation?

+4
source share
2 answers

, , 3:02.

enter image description here

, , . , .

+3

. (, ,...), . - , tanh [-1, + 1] , .. [10, ) - . .

, ., , . , RNN .

, ..

, , RNN - , .

, "" , ( )

RNN / . , RNN .

  • RNN

  • / , , , ... " "

RNN , , .

( RNNs), () -

( RNN): http://www.physics.mcgill.ca/~gang/eprints/eprintLovejoy/neweprint/Aegean.final.pdf

RNN: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4107715/ - : Echo State Networks Revisited

+1
source

Source: https://habr.com/ru/post/1687305/


All Articles