How to eliminate bottlenecks for reconfiguring host + device memory in OpenCL / CUDA

Question

How to eliminate bottlenecks for reconfiguring host + device memory in OpenCL / CUDA

If my algorithm is the host bottleneck for the device and the device to transfer memory to memory, is the only solution to another or modified algorithm?

+3

memory opencl cuda nvidia

smuggledpancakes Oct 19 '10 at 20:04

source share

2 answers

, , , . , - , .

+3

Paul R 19 . '10 20:15

wnbell · Accepted Answer · 2010-10-19T20:41:13+0000

There are a few things you can try to mitigate the PCIe bottleneck:

Asynchronous transfers - allow you to perform overlapping calculations and mass transfer.
Mapped memory - allows the kernel to transfer data to / from the GPU at runtime

, , GPU .

cudaMemcpyAsync API , , , , . , , .

API cudaHostAlloc , . , , , , , . , . Mapped memory , .

3.2.6-3.2.7 CUDA 3.1 CUDA. 3 OpenCL Best Practices Guide , OpenCL.

How to eliminate bottlenecks for reconfiguring host + device memory in OpenCL / CUDA

More articles: