Why is the shape of the filter the same?
Firstly, the shape of the kernel is the same, just to speed up the calculations. This allows convolution to be applied in a package, for example, using col2im transform and matrix multiplication. It is also convenient to store all weights in a single multidimensional array. Although mathematically you can imagine using several filters of various shapes.
Some architectures, such as the initial network, use this idea and apply different convolutional layers (with different cores) in parallel and add function maps at the end. This has proven to be very helpful.
Why is one filter not enough?
, , , , . . , .
, , , ... , . , . , MNIST.
?
: . , . , .
: , . , . , , , , . -.
, - , : . , , . , .
CS231n , .