Numpy: compute derivative of softmax function

Question

Numpy: compute derivative of softmax function

I am trying to understand backpropagationin a simple 3-layer neural network with MNIST.

There is an input level c weightsand a bias. Labels MNIST, so this is a class vector 10.

The second level is linear tranform. The third level is softmax activationto get the result as a probability.

backpropagation calculates the derivative at each step and calls it a gradient.

Previous layers add a gradient globalor previousto local gradient. I'm having trouble calculatinglocal gradient softmax

Several resources on the Internet explain softmax and its derivatives and even provide code samples for softmax itself.

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

The derivative is explained as to when i = jand when i != j. This is a simple piece of code that I came up with and was hoping to test my understanding:

def softmax(self, x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

def forward(self):
    # self.input is a vector of length 10
    # and is the output of 
    # (w * x) + b
    self.value = self.softmax(self.input)

def backward(self):
    for i in range(len(self.value)):
        for j in range(len(self.input)):
            if i == j:
                self.gradient[i] = self.value[i] * (1-self.input[i))
            else: 
                 self.gradient[i] = -self.value[i]*self.input[j]

Then self.gradientthere local gradientis being a vector. It's right? Is there a better way to write this?

+4

python numpy neural-network softmax backpropagation

Sam hammamy Nov 13 '16 at 16:02

source share

3 answers

, n^2 .

, , dSM[i]/dx[k] - SM[i] * (dx[i]/dx[k] - SM[j]), :

        if i == j:
            self.gradient[i,j] = self.value[i] * (1-self.value[i))
        else: 
            self.gradient[i,j] = -self.value[i] * self.value[j]

        if i == j:
            self.gradient[i] = self.value[i] * (1-self.input[i))
        else: 
             self.gradient[i] = -self.value[i]*self.input[j]

, :

SM = self.value.reshape((-1,1))
jac = np.diag(self.value) - np.dot(SM, SM.T)

+6

Julien 13 . '16 17:44

np.exp , Inf. x.

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x - x.max())
    return exps / np.sum(exps)

If x is a matrix, check the softmax function on this laptop ( https://github.com/rickiepark/ml-learn/blob/master/notebooks/5.%20multi-layer%20perceptron.ipynb )

+3

Haesun park Nov 14 '16 at 14:22

source share

Wasi Ahmad · Accepted Answer · 2016-11-13T17:45:42+0000

, 3- NN W1, b1 W2, b2 . Z1 Z2 - . a1 a2 . a2 - . delta3 delta2 - (backpropagated), .

NN ( , ). , ! , .

Numpy: compute derivative of softmax function

More articles: