Problems with implementing neural network backpropagation

I read a lot at Neural Networks and coached them with backprogpagation, primarily this Coursera course , with additional reading from here and here . I thought that I had a pretty strong understanding of the basic algorithm, but my attempt to create a trained neural network with reverse processing did not work, and I'm not sure why.

The code is in C ++, but there is no vectorization yet.

I wanted to create a simple 2 input neuron, 1 hidden neuron, 1 output neuron, a network to simulate function I. Just to understand how the concepts worked before moving on to a more complex example. My direct distribution code worked when I manually encoded the values ​​for weights and offsets.

float NeuralNetwork::ForwardPropagte(const float *dataInput) { int number = 0; // Write the input data into the input layer for ( auto & node : m_Network[0]) { node->input = dataInput[number++]; } // For each layer in the network for ( auto & layer : m_Network) { // For each neuron in the layer for (auto & neuron : layer) { float activation; if (layerIndex != 0) { neuron->input += neuron->bias; activation = Sigmoid( neuron->input); } else { activation = neuron->input; } for (auto & pair : neuron->outputNeuron) { pair.first->input += static_cast<float>(pair.second)*activation; } } } return Sigmoid(m_Network[m_Network.size()-1][0]->input); } 

Some of these variables are rather poorly named, but basically, neuron-> outputNeuron is a vector of pairs. The first is a pointer to the next neuron, and the second is the weight value. neuron-> input is the value of "z" in the equation of the neural network, the sum of all activations + bais. Sigmoid is determined by:

 float NeuralNetwork::Sigmoid(float value) const { return 1.0f/(1.0f + exp(-value)); } 

The two seem to work as intended. After going through the network, all values ​​of "z" or "neuron-> input" are reset to zero (or after back propagation).

Then I train the network after the psudo code below. The workout code is executed several times.

 for trainingExample=0 to m // m = number of training examples perform forward propagation to calculate hyp(x) calculate cost delta of last layer delta = y - hyp(x) use the delta of the output to calculate delta for all layers move over the network adjusting the weights based on this value reset network 

Actual code here:

 void NeuralNetwork::TrainNetwork(const std::vector<std::pair<std::pair<float,float>,float>> & trainingData) { for (int i = 0; i < 100; ++i) { for (auto & trainingSet : trainingData) { float x[2] = {trainingSet.first.first,trainingSet.first.second}; float y = trainingSet.second; float estimatedY = ForwardPropagte(x); m_Network[m_Network.size()-1][0]->error = estimatedY - y; CalculateError(); RunBackpropagation(); ResetActivations(); } } } 

With the backpropagation function defined by:

 void NeuralNetwork::RunBackpropagation() { for (int index = m_Network.size()-1; index >= 0; --index) { for(auto &node : m_Network[index]) { // Again where the "outputNeuron" is a list of the next layer of neurons and associated weights for (auto &weight : node->outputNeuron) { weight.second += weight.first->error*Sigmoid(node->input); } node->bias = node->error; // I'm not sure how to adjust the bias, some of the formulas seemed to point to this. Is it correct? } } } 

and the cost is calculated by the formula:

 void NeuralNetwork::CalculateError() { for (int index = m_Network.size()-2; index > 0; --index) { for(auto &node : m_Network[index]) { node->error = 0.0f; float sigmoidPrime = Sigmoid(node->input)*(1 - Sigmoid(node->input)); for (auto &weight : node->outputNeuron) { node->error += (weight.first->error*weight.second)*sigmoidPrime; } } } } 

I randomize the scales and run them in the dataset:

  x = {0.0f,0.0f} y =0.0f x = {1.0f,0.0f} y =0.0f x = {0.0f,1.0f} y =0.0f x = {1.0f,1.0f} y =1.0f 

Of course, I should not train and test with the same dataset, but I just wanted to run the basic backpropagation algortithm algorithm. When I run this code, I see that the weights / offsets look like this:

 Layer 0 Bias 0.111129 NeuronWeight 0.058659 Bias -0.037814 NeuronWeight -0.018420 Layer 1 Bias 0.016230 NeuronWeight -0.104935 Layer 2 Bias 0.080982 

A set of workouts is performed, and the average square error delta [outputLayer] looks something like this:

 Error: 0.156954 Error: 0.152529 Error: 0.213887 Error: 0.305257 Error: 0.359612 Error: 0.373494 Error: 0.374910 Error: 0.374995 Error: 0.375000 ... remains at this value for ever... 

And the final weights are as follows: (they always end at this level)

 Layer 0 Bias 0.000000 NeuronWeight 15.385233 Bias 0.000000 NeuronWeight 16.492933 Layer 1 Bias 0.000000 NeuronWeight 293.518585 Layer 2 Bias 0.000000 

I agree that this may seem like a pretty roundabout way to study neural networks, and the implementation (at the moment) is very suboptimal. But can anyone identify the point at which I am making the wrong assumption, or is the implementation or formula wrong?

EDIT

Thanks for the feedback for the offset values, I stopped them by applying to the input layer and stopped transmitting the input layer through the sigmoid function. Another of my sigmoid functions is invalid. But the network is still not working. I updated the error and deduced above what is happening now.

+5
source share
2 answers

I solved my problem (besides the initial biases / signed questions above). I started to subtract, instead of adding to the scales. In the sources that I looked at, they had a minus sign inside the calculation of the delta value, which I do not have, but I kept their format for adding a negative value to the weights. In addition, I was puzzled by what to do with weight and misinterpreted one source who said to assign it to a mistake. Now I see that intuition considers it to be a normal weight, but multiplying it by the displacement constant 1 instead of z. After I added to these changes, iterations over a set of workouts ~ 1000 times, it was possible to simulate simple bitwise expressions such as OR and AND.

+1
source

As the layman said, you have a lot of prejudices. You do not need an offset in the last layer, this output level and offset should be related to its input, but not to its output. Take a look at the following image: HTRwG.png

In this image, you can see that there is only one offset per layer, with the exception of the last, where there is no need for an offset.

Here you can read a very intuitive approach to neural networks. This is in Python, but it can help you better understand some concepts of neural networks.

+4
source

Source: https://habr.com/ru/post/1236170/


All Articles