If I transfer one byte from the CUDA core to PCI-E to the host (zero-copy memory), how slow is it compared to transferring something like 200 megabytes?
What would I like to know, since I know that PCI-E transfer is slow for the CUDA core: does it change something if I transfer only one byte or a huge amount of data? Or, perhaps, since memory transfers are carried out in “packets”, transmitting a single byte is extremely expensive and useless with respect to transferring 200 MB?
source share