The answer is explained on the same page:
Convolution will calculate 32 functions for each 5x5 patch. this weight tensor will have the form [5, 5, 1, 32]
Not involved in the application, but these terms need explanation
- The convolution kernel size is
5X5 . This means that there is a 5X5 matrix that is convoluted with the input image, moving it around the image. Check this link for an explanation of how the small 5X5 matrix moves around the 28X28 image and multiplies the different cells of the image matrix by it. This gives us the first two digits [5, 5, 1, 32] - Input Channel Size
1 . These are BW images, therefore, one input channel. Most color images have 3 channels, so expect 3 on some other convolution networks that work with images. Indeed, for the second level W_conv2 number of input channels 32 is equal to the number of output channels of level 1. - The last size of the weight matrix is โโperhaps the most difficult to visualize. Imagine your
5X5 matrix and repeat it 32 times !. Each of these 32 things is called channels . To complete the discussion, each of these 32 5X5 matrices is initialized with random weights and learns independently with forward / reverse propagation of the network. More channels explore different aspects of the image and therefore provide additional power to your network.
If you sum these 3 points, you get the desired size of layer 1. Subsequent layers are an extension. The first two measurements are the dimensions of the core (5X5) in this case. The third dimension is equal to the size of the input channel, which is equal to the size of the output channel of the previous layer. (32, since we announced 32 output channels of layer 1). The final size is the size of the output channel of the current layer (64, even lager for the second layer!). To support a large number of independent 5X5 cores again!).
Finally, the last two layers: the last dense layer is the only one that is associated with some calculations:
- For each convolution layer, final size = initial size
- For a layer pool of size kXk, final size = initial size / k
So,
- For conv1, the size remains
28 X 28 - pool1 reduces size to
14 X 14 - For conv2, the size remains
14 X 14 - pool2 reduces size to
7 X 7
And, of course, we have 64 channels because of conv2 - the union does not affect them. Therefore, we get the final solid input 7X7X64 . Then we create the fully connected hidden layers of 1024 and add 10 output classes for 10 digits.
source share