I read a lot at Neural Networks and coached them with backprogpagation, primarily this Coursera course , with additional reading from here and here . I thought that I had a pretty strong understanding of the basic algorithm, but my attempt to create a trained neural network with reverse processing did not work, and I'm not sure why.
The code is in C ++, but there is no vectorization yet.
I wanted to create a simple 2 input neuron, 1 hidden neuron, 1 output neuron, a network to simulate function I. Just to understand how the concepts worked before moving on to a more complex example. My direct distribution code worked when I manually encoded the values ββfor weights and offsets.
float NeuralNetwork::ForwardPropagte(const float *dataInput) { int number = 0; // Write the input data into the input layer for ( auto & node : m_Network[0]) { node->input = dataInput[number++]; } // For each layer in the network for ( auto & layer : m_Network) { // For each neuron in the layer for (auto & neuron : layer) { float activation; if (layerIndex != 0) { neuron->input += neuron->bias; activation = Sigmoid( neuron->input); } else { activation = neuron->input; } for (auto & pair : neuron->outputNeuron) { pair.first->input += static_cast<float>(pair.second)*activation; } } } return Sigmoid(m_Network[m_Network.size()-1][0]->input); }
Some of these variables are rather poorly named, but basically, neuron-> outputNeuron is a vector of pairs. The first is a pointer to the next neuron, and the second is the weight value. neuron-> input is the value of "z" in the equation of the neural network, the sum of all activations + bais. Sigmoid is determined by:
float NeuralNetwork::Sigmoid(float value) const { return 1.0f/(1.0f + exp(-value)); }
The two seem to work as intended. After going through the network, all values ββof "z" or "neuron-> input" are reset to zero (or after back propagation).
Then I train the network after the psudo code below. The workout code is executed several times.
for trainingExample=0 to m // m = number of training examples perform forward propagation to calculate hyp(x) calculate cost delta of last layer delta = y - hyp(x) use the delta of the output to calculate delta for all layers move over the network adjusting the weights based on this value reset network
Actual code here:
void NeuralNetwork::TrainNetwork(const std::vector<std::pair<std::pair<float,float>,float>> & trainingData) { for (int i = 0; i < 100; ++i) { for (auto & trainingSet : trainingData) { float x[2] = {trainingSet.first.first,trainingSet.first.second}; float y = trainingSet.second; float estimatedY = ForwardPropagte(x); m_Network[m_Network.size()-1][0]->error = estimatedY - y; CalculateError(); RunBackpropagation(); ResetActivations(); } } }
With the backpropagation function defined by:
void NeuralNetwork::RunBackpropagation() { for (int index = m_Network.size()-1; index >= 0; --index) { for(auto &node : m_Network[index]) {
and the cost is calculated by the formula:
void NeuralNetwork::CalculateError() { for (int index = m_Network.size()-2; index > 0; --index) { for(auto &node : m_Network[index]) { node->error = 0.0f; float sigmoidPrime = Sigmoid(node->input)*(1 - Sigmoid(node->input)); for (auto &weight : node->outputNeuron) { node->error += (weight.first->error*weight.second)*sigmoidPrime; } } } }
I randomize the scales and run them in the dataset:
x = {0.0f,0.0f} y =0.0f x = {1.0f,0.0f} y =0.0f x = {0.0f,1.0f} y =0.0f x = {1.0f,1.0f} y =1.0f
Of course, I should not train and test with the same dataset, but I just wanted to run the basic backpropagation algortithm algorithm. When I run this code, I see that the weights / offsets look like this:
Layer 0 Bias 0.111129 NeuronWeight 0.058659 Bias -0.037814 NeuronWeight -0.018420 Layer 1 Bias 0.016230 NeuronWeight -0.104935 Layer 2 Bias 0.080982
A set of workouts is performed, and the average square error delta [outputLayer] looks something like this:
Error: 0.156954 Error: 0.152529 Error: 0.213887 Error: 0.305257 Error: 0.359612 Error: 0.373494 Error: 0.374910 Error: 0.374995 Error: 0.375000 ... remains at this value for ever...
And the final weights are as follows: (they always end at this level)
Layer 0 Bias 0.000000 NeuronWeight 15.385233 Bias 0.000000 NeuronWeight 16.492933 Layer 1 Bias 0.000000 NeuronWeight 293.518585 Layer 2 Bias 0.000000
I agree that this may seem like a pretty roundabout way to study neural networks, and the implementation (at the moment) is very suboptimal. But can anyone identify the point at which I am making the wrong assumption, or is the implementation or formula wrong?
EDIT
Thanks for the feedback for the offset values, I stopped them by applying to the input layer and stopped transmitting the input layer through the sigmoid function. Another of my sigmoid functions is invalid. But the network is still not working. I updated the error and deduced above what is happening now.