The argument gives the number of groups, not the size. If you have 40 inputs and a value of g is set to 20, you will get 20 "tracks" in 2 channels; with 50 outputs, you will get 10 groups of 2 and 10 groups of 3.
Most often, you are divided into a small number of groups, for example 2. In this case, you have two processed "bands" or groups. For the mentioned 40 => 50 levels, each group will have 20 inputs and 25 outputs. Each layer will be split in half, and each set of forward and backward propagation works only within its own half, for the range of layers over which the group parameter is applied (I think that it fully fits the last layer).
The advantage of processing is that instead of 40 ^ 2 input connections, you have 2 groups of 20 ^ 2 connections or half as many. This speeds up processing by about half, with very little convergence loss.
Prune source share