CUDA memory release painfully slow

I highlight some floating point arrays (quite large, i.e. 9,000,000 items) on the GPU using cudaMalloc((void**)&(storage->data), size * sizeof(float)) . At the end of my program, I will free this memory using cudaFree(storage->data); .

The problem is that the first release is very slow, about 10 seconds, while others are almost instantaneous.

My question is this: what can cause this difference? Is disadaptation memory on the GPU generally slow?

+4
source share
2 answers

As stated on the NVIDIA forums, this is almost certainly a problem with the way you choose things, not cudaFree.

+3
source

should not be so slow, on Linux with cuda 2.2 it takes a split second. Have you tried running host and device profilers to find out why it is slow? how much you have allocated a separate distribution ?, who has some kind of penalty, but not so big.

+1
source

Source: https://habr.com/ru/post/1299698/


All Articles