You must ask yourself two questions.
- how much cpus do i have?
- what percentage of the time will have access to a useful program for the same card?
The first question indicates the maximum number of threads that can immediately access the map. You can have 10,000 threads, but if you have only 4 processors, at most 4 will be launched.
The second question tells you that any of these threads will access the map and do something useful. You can optimize the card to do something useless (for example, a micro benchmark), but there is no exact setting for this IMHO. Let's say you have a useful program that uses a map a lot. This can spend 90% of the time doing something else, for example. IO, accessing other cards, creating keys or values, doing something with the values that it receives from the card.
Say that you spend 10% of your time accessing the card on a machine with 4 processors. This means that on average you will get access to the map in 0.4 flows on average. (Or one thread about 40% of the time). In this case, concurrency level 1-4 is fine.
In any case, if the concurrency level is higher than the number of processors that you have, it will probably be superfluous, even for the micro benchmark.
source share