Going through the Caffe tutorial: http://caffe.berkeleyvision.org/gathered/examples/mnist.html
I really got confused in the different (and effective) model used in this tutorial, which is defined here: https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_train_test.prototxt
As I understand it, the convolutional layer in Caffe simply calculates the sum of Wx + b for each input without using any activation function. If we want to add an activation function, we must add another layer immediately below this convolutional layer, for example Sigmoid, Tanh or Relu. Any paper / tutorial that I read on the Internet applies the activation function to units of neurons.
This leaves me with a big question mark, since we can only see convolution layers and layer pools alternating in the model. Hope someone can give me an explanation.
As a note to the site, for me another doubt is max_iter in this solver:
https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt
We have 60,000 images for training, 10,000 images for testing. So why is max_iter here only 10.000 (and it can still get> 99% accuracy)? What does Caffe do at each iteration? In fact, I'm not sure accuracy accuracy is the full correct prediction / test size.
I am very surprised by this example, since I did not find any example, a framework that can achieve this high accuracy in this very short time (just 5 minutes to get an accuracy rate of 99%). Therefore, I doubt that there must be something that I misunderstood.
Thank.
source
share