In many resources on the Internet you can find various ways to use the "memory", "bandwidth" of the "latent" linked cores. It seems to me that authors sometimes use their own definition of these terms, and I think if it would be very useful for someone to make a clear distinction.
As far as I understand: Throughput-related kernels approach the physical limits of the device in terms of access to global memory. For instance. The app uses 170GB / s of 177GB / s on the M2090.
The kernel associated with the delay is one whose main reason for blocking is memory retrieval. Thus, we do not saturate the global memory bus, but we still have to wait to get the data in the kernel.
The core of computation related to computation is one in which computation dominates kernel time, assuming there is no problem with the root of the kernel with memory, and there is a good match between arithmetic and delay.
If I understood correctly, what would the core associate with memory? Is there any ambiguity, and if so, should we limit the conversation to the three above terms?
Thank!
source
share