I implemented a neural network (using CUDA) with two layers. (2 neurons per layer). I'm trying to get him to learn two simple quadratic polynomial functions using backpropagation.
But instead of converging, it diverges (the output becomes endless)
Here are some details about what I tried:
- I set the initial weights to 0, but since it was diverging, I randomized the starting weights
- I read that a neural network can diverge if the learning speed is too high, so I reduced the learning speed to 0.000001
- The two functions I'm trying to add are:
3 * i + 7 * j+9 and j*j + i*i + 24 (I add layer i and j as input) - I implemented it as one layer before and could better approximate the polynomial functions
- I am thinking about introducing momentum into this network, but I'm not sure if this will help him find out.
- I use a linear (as not) activation function
- At the beginning there is a fluctuation, but the output begins to diverge when any of the weights becomes greater than 1
I checked and double-checked my code, but there seemed to be no problem with it.
So here is my question: what is going on here?
Any pointer would be appreciated.
source share