How many meshes in CUDA

How many CUDA grids are possible in the GPU?

Can two lattices exist together in a GPU? Or does one GPU have only one grid?

Kernel1<<gridDim, blockDim>>(dst1, param1); Kernel1<<gridDim, blockDim>>(dst2, param2); 

Do the two cores perform higher simultaneously or sequentially?

+4
source share
2 answers

If two cores are issued as indicated above, they will be serialized (they will be executed sequentially). This is due to the fact that without any other code (that is, to switch threads), two cores will be issued to one cuda thread. All cuda calls issued for the same thread are executed sequentially, even if you think you should see differently because you are using cudaMemcpyAsync or something like that.

Of course, it is possible that several cores work asynchronously relative to each other (possibly simultaneously), but for this you need to use the cuda API.

You can see section 3.2.5 "Asynchronous Concurrent Execution" in the CUDA C Programmers Guide to learn more about threads and kernel execution at the same time. In addition, the nvidia CUDA SDK has several examples, such as simple threads that illustrate concepts. An example of parallel cores shows how to run multiple cores simultaneously (using threads). Please note that to run the kernel at the same time, computing power of 2.0 or β€œhigher” hardware is required.

Also, to answer your first question, from section 3.2.5.3 of the CUDA C Programming Guide, "The maximum number of kernel starts that a device can execute at the same time is sixteen."

For reference, a "grid" is the entire array of threads associated with the launch of a single core.

+7
source

To work through Robert's answer, here is an example of how you could use threads to simultaneously run two of your instances of Kernel1 :

 cudaStream_t stream1; cudaStreamCreate(&stream1); cudaStream_t stream2; cudaStreamCreate(&stream2); Kernel1<<gridDim, blockDim, 0, stream1>>(dst1, param1); Kernel1<<gridDim, blockDim, 0, stream2>>(dst2, param2); 

A few notes on parallel execution with threads:

  • If we start the kernel without specifying the stream Kernel1<<<g, b>>>() , and then start the kernel with a specific stream Kernel2<<<g, b, 0, stream>>>() , then Kernel2 will wait for the completion of Kernel1 .
  • When the kernel starts without a thread ( Kernel1<<<g, b>>>() ), Nvidia calls this "using a NULL thread."
  • If you use cudaEvents , your work can sometimes be serialized, even if you extend the kernels to multiple threads.
+2
source

Source: https://habr.com/ru/post/1438658/


All Articles