I am getting cuda memory allocation error when using cudaGraphicsGLRegisterBuffer() . I have a fairly large OpenGL PBO object, which is shared with him and CUDA. The PBO is created as follows:
GLuint buffer; glGenBuffers(1, &buffer); glBindBuffer(GL_PIXEL_UNPACK_BUFFER, buffer); glBufferData(target, rows * cols * 4, NULL, GL_DYNAMIC_COPY); glUnmapBuffer(_target); glBindBuffer(_target, 0);
The facility is quite large. width and height - 5000. However, it allocates fines on my GPU. Now I am sharing this between OpenGL and CUDA as follows. I have a simple class to manage it like this:
class CudaPBOGraphicsResource { public: CudaPBOGraphicsResource(GLuint pbo_id); ~CudaPBOGraphicsResource(); inline cudaGraphicsResource_t resource() const { return _cgr; } private: cudaGraphicsResource_t _cgr; }; CudaPBOGraphicsResource::CudaPBOGraphicsResource(GLuint pbo_id) { checkCudaErrors(cudaGraphicsGLRegisterBuffer(&_cgr, pbo_id, cudaGraphicsRegisterFlagsNone)); checkCudaErrors(cudaGraphicsMapResources(1, &_cgr, 0)); } CudaPBOGraphicsResource::~CudaPBOGraphicsResource() { if (_cgr) { checkCudaErrors(cudaGraphicsUnmapResources(1, &_cgr, 0)); } }
Now I interact with OpenGL and CUDA as follows:
{ CudaPBOGraphicsResource input_cpgr(pbo_id); uchar4 * input_ptr = 0; size_t num_bytes; checkCudaErrors(cudaGraphicsResourceGetMappedPointer((void **)&input_ptr, &num_bytes, input_cpgr.resource())); call_my_kernel(input_ptr); }
This has been doing for my inputs for a while, but after a while it fails:
CUDA error code=2(cudaErrorMemoryAllocation) "cudaGraphicsGLRegisterBuffer(&_cgr, pbo_id, cudaGraphicsRegisterFlagsNone)" Segmentation fault
I am not sure why memory allocation occurs, as I thought it was shared. I added cudaDeviceSynchronize() after the kernel call, but the error still persists. My call_my_kernel() function now does almost nothing, so there are no other CUDA calls that can cause this error!
I am using Cuda 7 on Linux with a K4000 Quadro card.
EDIT I updated the driver to the latest version 346.72 and the error still occurs. It is also independent of kernel invocation. Just calling cudaGraphicsGLRegisterBuffer() seems to be a memory leak on the GPU. Running nvidia-smi as you start the program shows that the memory is growing steadily. I still don't understand why some kind of copying happens ...