Neural network propagation instead of convergence

I implemented a neural network (using CUDA) with two layers. (2 neurons per layer). I'm trying to get him to learn two simple quadratic polynomial functions using backpropagation.

But instead of converging, it diverges (the output becomes endless)

Here are some details about what I tried:

  • I set the initial weights to 0, but since it was diverging, I randomized the starting weights
  • I read that a neural network can diverge if the learning speed is too high, so I reduced the learning speed to 0.000001
  • The two functions I'm trying to add are: 3 * i + 7 * j+9 and j*j + i*i + 24 (I add layer i and j as input)
  • I implemented it as one layer before and could better approximate the polynomial functions
  • I am thinking about introducing momentum into this network, but I'm not sure if this will help him find out.
  • I use a linear (as not) activation function
  • At the beginning there is a fluctuation, but the output begins to diverge when any of the weights becomes greater than 1

I checked and double-checked my code, but there seemed to be no problem with it.

So here is my question: what is going on here?

Any pointer would be appreciated.

+4
source share
2 answers
  • If the problem you are trying to solve is related to the type of classification, try a 3-layer network (3 is enough according to Kolmogorov). Connections from inputs A and B to hidden node C (C = A * wa + B * wb) represent a line in the space AB. This line divides the right and wrong half-spaces. Linking from the hidden layer to the output, put the values โ€‹โ€‹of the hidden layer in correlation with each other, giving you the desired result.

  • Depending on your data, the error function may look like a hairbrush, so help should help. Keeping the learning speed at 1 turned out to be optimal for me.

  • Your training sessions will be stuck in local lows every once in a while, so online training will consist of several subsequent sessions. If the session exceeds the maximum iterations, or the amplitude is too high, or the error is clearly high - the session did not pass, start another.

  • At the beginning of each, repeat the initialization of your weights with random (-0.5 - +0.5) values.

  • It really helps to make an error chart. You will get this "Aha!" factor.

+5
source

The most common reason for rejecting a neural network code is because the encoder forgot to put a negative sign in the change in the weight expression.

Another reason may be that there is a problem with the error expression used to calculate the gradients.

If they do not execute, we need to see the code and respond.

+3
source

Source: https://habr.com/ru/post/1494629/


All Articles