How are several gpus used in caffe?

I want to know how Caffe uses several GPU so that I can decide to upgrade to a new, more powerful card or just buy the same card and run it on SLI .
For example, should I buy one TitanX 12 GB or two GTX 1080 8 GB ?
If I go to the SLI 1080, will my effective memory double? I mean, can I run a network that uses 12 or more GB files using them? Or did I leave only 8 GB? Again, how is memory used in such scenarios? What happens if two different cards (both NVIDIA) are installed? Does coffee use the available memory the same way? (suppose one is 980 and one is 970!)

+5
source share
3 answers

For example, should I buy one TitanX 12 GB or two GTX 1080 8 GB? If I go to the SLI 1080, will my effective memory double? Can I start a network that uses 12 or more GB vram using them? Or did I leave only 8 GB?

No, the effective memory size in case of 2 GPUs with 8Gb RAM will be 8Gb, but the effective batch size will be doubled, which will lead to a more stable / faster training.

What happens if two different cards (both NVIDIA) are installed? Does coffee use the available memory the same way? (suppose 980 and one 970!)

I think that you will be limited to the bottom card and may have problems with drivers, so I do not recommend trying this configuration. Also from the documentation:

The current implementation has a β€œsoft” assumption that the devices used are homogeneous. In practice, any devices of the same general class should work together, but performance and overall size are limited by the smallest device used. for example, if you combine TitanX and GTX980, performance will be limited to 980. Mixing significantly different levels of boards, for example. Kepler and Fermi are not supported.

To summarize: with a GPU that has a lot of RAM, you can train deeper models, with multiple GPUs you can work faster with a single model, and you can also train separate models for one GPU. I would choose a single graphics processor with a large amount of memory (TitanX), because deep networks are currently limited by RAM (for example, ResNet-152 or some kind of semantic segmentation network), and more memory will make it possible to run deeper networks with more lot size, otherwise, if you have some tasks that are suitable for one GPU (GTX 1080), you can buy 2 or 4 of them to speed up the work.

It also provides information on supporting multiple GPUs in Caffe:

The current implementation uses a tree reduction strategy. for example, if there are 4 GPUs in the system, 0: 1, 2: 3 will exchange gradients, then 0: 2 (the top of the tree) will exchange gradients, 0 will calculate the updated model, 0-> 2, and then 0-> 1 , 2-> 3.

https://github.com/BVLC/caffe/blob/master/docs/multigpu.md

+5
source

I do not believe that Caffe supports SLI mode. Two GPUs are treated as separate cards.

When you start Caffe and add the '-gpu' flag (assuming you use the command line), you can specify which GPU to use (-gpu 0 or -gpu 1 for example). You can also specify multiple GPUs (-gpu 0,1,3), including using all GPUs (-gpu all).

When you are using multiple GPUs, Caffe will do the training on all GPUs and then combine learning updates through models. This effectively doubles (or more if you have more than 2 GPUs) the lot size for each iteration.

In my case, I started working with the NVIDIA GTX 970 (4 GB card), and then upgraded to the NVIDIA GTX Titan X (Maxwell version with 12 GB) because my models were too big to fit in the GTX 970. I can run some of the smaller models on both cards (although they do not match) until the model fully fits into a 4 GB smaller card. Using the standard ImageNet model, I could perform on both cards and cut the training time in half.

If I remember correctly, other frameworks (TensorFlow and possibly Microsoft CNTK) support the separation of the model between different nodes by effectively increasing the available GPU memory, like what you are describing. Although I personally have not tried any, I understand that you can determine at the level of each layer where the layer performs.

Patrick

Link

+1
source

Perhaps the late answer, but caffe supports gpu parallelism, which means you can really use both gpu fully, but I recommend getting two gpu of equal memory size, since I don't think caffe allows you to choose a package size on gpu.

As for memory usage, when using multiple gpu, each gpu gets a batch of a batch size, as indicated in your train_val.prototxt, so if your batch size is, for example, 16 and you use 2 gpu, you will have an effective batch size of 32 .

Finally, I know that for things like games, SLI seems much less efficient and often much more problematic than having one powerful GPU. Therefore, if you plan to use the GPU for more than deep learning, I would recommend that you still go to the Titan X

+1
source

Source: https://habr.com/ru/post/1261658/


All Articles