There are a few things you can try to mitigate the PCIe bottleneck:
- Asynchronous transfers - allow you to perform overlapping calculations and mass transfer.
- Mapped memory - allows the kernel to transfer data to / from the GPU at runtime
, , GPU .
cudaMemcpyAsync API , , , , . , , .
API cudaHostAlloc , . , , , , , . , . Mapped memory , .
3.2.6-3.2.7 CUDA 3.1 CUDA. 3 OpenCL Best Practices Guide , OpenCL.