Caffe: What does group param mean?

Question

Caffe: What does group param mean?

I read the documentation on the group parameter:

group (g) [default 1]: If g> 1, we restrict the connection of each filter to a subset of the input. In particular, the input and output channels are divided into groups g, and the ith channel of the output group will be connected only to the ith channels of the input groups.

But first of all, I do not quite understand what they mean. And secondly, why should I use this. Can someone help me explain this a little better?

As I understand it, this means the following:

If I set g to more than 1, my input and output channels are divided into groups. But how is this done? If I set it to 20 and my input is 40, will I have groups of 20? And if the output is 50, will I have one group of 20 and one group of 30?

+5

deep-learning caffe conv-neural-network

thigi Nov 29 '16 at 18:15

source share

2 answers

And secondly, why should I use [grouping]?

This was originally presented as an optimization in an article that triggered the current neural network popularity cycle:

Alex Krizhevsky, Ilya Sutskever and Jeffrey E. Hinton. " Imagenet classification with deep convolutional neural networks ." In Advances in Neural Information Processing Systems, pp. 1097-1105. 2012.

Figure 2 shows how grouping was used for this work. Caffe authors originally added this ability so they can copy AlexNet architecture. However, grouping still proves useful in other scenarios.

For example, both Facebook and Google have released documents that essentially show that grouping can significantly reduce resource use, helping to maintain accuracy. Facebook paper can be seen here :( ResNeXt ), and Google paper can be found here: ( MobileNets )

+6

Martin thoma Dec 02 '16 at 8:10

source share

Prune · Accepted Answer · 2016-11-29T22:09:04+0000

The argument gives the number of groups, not the size. If you have 40 inputs and a value of g is set to 20, you will get 20 "tracks" in 2 channels; with 50 outputs, you will get 10 groups of 2 and 10 groups of 3.

Most often, you are divided into a small number of groups, for example 2. In this case, you have two processed "bands" or groups. For the mentioned 40 => 50 levels, each group will have 20 inputs and 25 outputs. Each layer will be split in half, and each set of forward and backward propagation works only within its own half, for the range of layers over which the group parameter is applied (I think that it fully fits the last layer).

The advantage of processing is that instead of 40 ^ 2 input connections, you have 2 groups of 20 ^ 2 connections or half as many. This speeds up processing by about half, with very little convergence loss.

Caffe: What does ** group ** param mean?

More articles:

Caffe: What does group param mean?