Jacobian is the matrix of all first-order partial derivatives of a vector function. In the case of a neural network, this is an N-on-W matrix, where N is the number of records in our training set, and W is the total number of parameters (weight + displacement) of our network. It can be created by taking the partial derivatives of each output with respect to each weight and obtaining the form:

Where F (xi, w) is the network function estimated for the ith input vector of the training set using the weight vector w, and wj is the jth element of the weight vector w of the network. In traditional Levenberg-Marquardt implementations, the Jacobian is approximated using finite differences. However, for neural networks it can be calculated very efficiently using the chain rule of calculus and the first derivatives of the activation functions.
source share