It's hard to understand the Caffe MNIST example.

Question

It's hard to understand the Caffe MNIST example.

Going through the Caffe tutorial: http://caffe.berkeleyvision.org/gathered/examples/mnist.html

I really got confused in the different (and effective) model used in this tutorial, which is defined here: https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_train_test.prototxt

As I understand it, the convolutional layer in Caffe simply calculates the sum of Wx + b for each input without using any activation function. If we want to add an activation function, we must add another layer immediately below this convolutional layer, for example Sigmoid, Tanh or Relu. Any paper / tutorial that I read on the Internet applies the activation function to units of neurons.

This leaves me with a big question mark, since we can only see convolution layers and layer pools alternating in the model. Hope someone can give me an explanation.

As a note to the site, for me another doubt is max_iter in this solver: https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt

We have 60,000 images for training, 10,000 images for testing. So why is max_iter here only 10.000 (and it can still get> 99% accuracy)? What does Caffe do at each iteration? In fact, I'm not sure accuracy accuracy is the full correct prediction / test size.

I am very surprised by this example, since I did not find any example, a framework that can achieve this high accuracy in this very short time (just 5 minutes to get an accuracy rate of 99%). Therefore, I doubt that there must be something that I misunderstood.

Thank.

+4

deep-learning caffe convolution mnist

hosyvietanh Feb 21 '16 at 7:58

source share

2 answers

. . , -, . LeNet-5 . MNIST - .

, Le Cun 2001, :

, F6 , . , a_i i, , i...

F6 - "blob" . , ( ReLU ).

MNIST - . . 92%. Lenet-5 .

+1

Jonathan 05 '17 19:34

Harsh wardhan · Accepted Answer · 2016-02-22T12:10:56+0000

Caffe uses batch processing. max_iteris 10,000 because it batch_sizeis 64 No of epochs = (batch_size x max_iter)/No of train samples.. So the number epochsis about 10. Accuracy is calculated on test data. And yes, the accuracy of the model is really> 99%, since the data set is not very complex.

It's hard to understand the Caffe MNIST example.

More articles: