First, let me apologize for typing three questions in this title. I'm not sure the best way is there.
I will be entitled to it. I think I understand the nerves well.
But LSTM really eludes me, and I feel maybe because I am not very good at repeating neural networks in general. I took the course of Hinton and Andrew Ng on Kurser. Many of them still do not make sense to me.
From what I understand, repeating neural networks are different from related neural networks, as past values influence the following prediction. A recurrent neural network is commonly used for sequences.
The example I saw in a recurrent neural network was binary.
010 + 011
A recurrent neural network will take at most 0 and 1 first, output a 1. Then take 1 next, print zero and transfer 1. Take the next 0,0 and print a 1, because it transferred 1 from the last calculation. Where does he store this 1? In root networks, the result is mainly:
y = a(w*x + b) where w = weights of connections to previous layer and x = activation values of previous layer or inputs
How is a recurrent neural network calculated? I am probably mistaken, but from what I understand, recurrent neural networks are a fairly strong neural network with hidden layers T, T is the number of time stamps. And each hidden layer accepts an input X in timestep T, and its outputs are then added to the next corresponding tabs of the hidden layer.
a(l) = a(w*x + b + pa) where l = current timestep and x = value at current timestep and w = weights of connections to input layer and pa = past activation values of hidden layer such that neuron i in layer l uses the output value of neuron i in layer l-1 y = o(w*a(l-1) + b) where w = weights of connections to last hidden layer
But even if I understood this correctly, I don’t see the benefits of this, simply using past values as input to a normal direct access network (a sliding window or something else that he called).
For example, what is the advantage of using a recurrent neural network for binary addition instead of training a training network with two output neurons. One for binary result and one for transfer? And then grab the transfer pin and plug it back into the direct network.
However, I'm not sure how this differs from how easy it is to have past values as input in the forward model.
It seems to me that the more timesteps, recurrent neural networks are only a drawback of more reliable networks due to the disappearance of the gradient. This brings me to my second question, from what I understood, LSTM is a solution to the fading gradient problem. But I don’t understand how they work. Also, are they just better than recurrent neural networks, or are there casualties when using LSTM?