Numpy: compute derivative of softmax function

I am trying to understand backpropagationin a simple 3-layer neural network with MNIST.

There is an input level c weightsand a bias. Labels MNIST, so this is a class vector 10.

The second level is linear tranform. The third level is softmax activationto get the result as a probability.

backpropagation calculates the derivative at each step and calls it a gradient.

Previous layers add a gradient globalor previousto local gradient. I'm having trouble calculatinglocal gradient softmax

Several resources on the Internet explain softmax and its derivatives and even provide code samples for softmax itself.

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

The derivative is explained as to when i = jand when i != j. This is a simple piece of code that I came up with and was hoping to test my understanding:

def softmax(self, x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

def forward(self):
    # self.input is a vector of length 10
    # and is the output of 
    # (w * x) + b
    self.value = self.softmax(self.input)

def backward(self):
    for i in range(len(self.value)):
        for j in range(len(self.input)):
            if i == j:
                self.gradient[i] = self.value[i] * (1-self.input[i))
            else: 
                 self.gradient[i] = -self.value[i]*self.input[j]

Then self.gradientthere local gradientis being a vector. It's right? Is there a better way to write this?

+4
source share
3 answers

, 3- NN W1, b1 W2, b2 . Z1 Z2 - . a1 a2 . a2 - . delta3 delta2 - (backpropagated), .

enter image description here enter image description here

NN ( , ). , ! , .

+10

, n^2 .

, , dSM[i]/dx[k] - SM[i] * (dx[i]/dx[k] - SM[j]), :

        if i == j:
            self.gradient[i,j] = self.value[i] * (1-self.value[i))
        else: 
            self.gradient[i,j] = -self.value[i] * self.value[j]

        if i == j:
            self.gradient[i] = self.value[i] * (1-self.input[i))
        else: 
             self.gradient[i] = -self.value[i]*self.input[j]

, :

SM = self.value.reshape((-1,1))
jac = np.diag(self.value) - np.dot(SM, SM.T)
+6

np.exp , Inf. x.

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x - x.max())
    return exps / np.sum(exps)

If x is a matrix, check the softmax function on this laptop ( https://github.com/rickiepark/ml-learn/blob/master/notebooks/5.%20multi-layer%20perceptron.ipynb )

+3
source

Source: https://habr.com/ru/post/1660630/


All Articles