I am trying to understand backpropagationin a simple 3-layer neural network with MNIST.
There is an input level c weightsand a bias. Labels MNIST, so this is a class vector 10.
The second level is linear tranform. The third level is softmax activationto get the result as a probability.
backpropagation calculates the derivative at each step and calls it a gradient.
Previous layers add a gradient globalor previousto local gradient. I'm having trouble calculatinglocal gradient softmax
Several resources on the Internet explain softmax and its derivatives and even provide code samples for softmax itself.
def softmax(x):
"""Compute the softmax of vector x."""
exps = np.exp(x)
return exps / np.sum(exps)
The derivative is explained as to when i = jand when i != j. This is a simple piece of code that I came up with and was hoping to test my understanding:
def softmax(self, x):
"""Compute the softmax of vector x."""
exps = np.exp(x)
return exps / np.sum(exps)
def forward(self):
self.value = self.softmax(self.input)
def backward(self):
for i in range(len(self.value)):
for j in range(len(self.input)):
if i == j:
self.gradient[i] = self.value[i] * (1-self.input[i))
else:
self.gradient[i] = -self.value[i]*self.input[j]
Then self.gradientthere local gradientis being a vector. It's right? Is there a better way to write this?
source
share