CUDA - How Slower Is It Passed Through PCI-E?

If I transfer one byte from the CUDA core to PCI-E to the host (zero-copy memory), how slow is it compared to transferring something like 200 megabytes?

What would I like to know, since I know that PCI-E transfer is slow for the CUDA core: does it change something if I transfer only one byte or a huge amount of data? Or, perhaps, since memory transfers are carried out in “packets”, transmitting a single byte is extremely expensive and useless with respect to transferring 200 MB?

+3
source share
1 answer

Hope this all explains. Data is generated by bandwidthTest in CUDA samples. Hardware environment - PCI-E v2.0, Tesla M2090 and 2x Xeon E5-2609. Note that both axes are in the log scale.

Given this indicator, we see that the overhead of starting a transfer request takes a constant time. Regression analysis of the data gives an estimated service time of 4.9us for H2D, 3.3us for D2H and 3.0us for D2D.

enter image description here

+5
source

Source: https://habr.com/ru/post/952412/


All Articles