Minibus size selection for deep learning

On Ilya Sutskever's blog, Deep Learning Overview , he describes how important it is to choose the right minibatch size to effectively train your deep neural network. He gives advice "use a smaller mini-clutch that works efficiently on your machine." See full quote below.

I have seen similar statements from other well-known deep learning researchers, but it’s still unclear to me how to find the right size for the minibar. Watching how a large mini-camera can provide greater learning speed, it looks like it takes a lot of experimentation to determine if a certain size mini-drum gives the best performance in terms of learning speed.

I have a GPU with 4 GB of RAM and use the Caffe and Keras libraries. What, in this case, is a practical heuristic for choosing a good minibar size, given that each observation has a certain amount of memory M ?

MiniDabs: Use mini quotes. Modern computers cannot be effective if you handle one case study at a time. This is significantly more effective for training the network on mini-bars of 128 examples, because this will lead to significantly greater bandwidth. In fact, it would be good to use mini cans of size 1, and they are likely to lead to improved performance and lower premium; but the advantage therefore outweighs the massive computational benefits provided by minibatches. But do not use very large mini-compartments, because they tend to work not so well, but gain more. Therefore, a practical recommendation: use a smaller mini-strip that works effectively on your computer.

+5
source share
1 answer

When we train the network, when we calculate the direct pass, we must save all the intermediate activation outputs for the reverse pass. You just need to calculate how much memory you will need to store all the corresponding activation outputs in your transition forward, in addition to other memory limitations (saving your weights on the GPU, etc.). Therefore, note that if your network is quite deep, you may want to take a smaller packet size, as you may run out of memory.

Chip size selection is a mixture of memory and performance / accuracy limitations (usually evaluated using cross-validation).

I personally guess / calculate / calculate manually how much GPU memory my forward / reverse will use and try a few values. If, for example, the largest that I can pick up is about 128, I can double-check the check using 32, 64, 96, etc., Just to be thorough and see if I can improve performance. This is usually for a deeper network that is going to push my GPU memory (I also only have a 4 GB card, I do not have access to the NVIDIA monster cards).

I think that, as a rule, more attention is paid to network architecture, optimization methods / trading tricks, data preprocessing.

+6
source

Source: https://habr.com/ru/post/1239700/


All Articles