Multiple cores in one program compared to one core per program

What is the actual difference in placing several cores in one program or compiling another program for each core, excluding the organization of the source code? In particular, is the pressure in the register determined by the size of the program or the actual core selected in the program? Is the sum of all __localstorage of all the cores allocated to run any of the cores? Are there any other observations related to performance (for example, the size of downloading code to a device, etc.)?

+4
source share
1 answer

This may be device specific, and I'm talking from the experience of the Intel GPU. The program area resources will be visible only to the cores in this program. Beyond this, register allocation is the core; therefore, 1 core in K programs against K kernels in 1 program does not affect register pressure. You create and link each program. Therefore, compiling K kernels in one program is less efficient in terms of startup time if you do not use all K kernels.

+3
source

Source: https://habr.com/ru/post/1609981/


All Articles