I am working on an OS X application in a multi-GPU setup (Mac Pro at the end of 2013) that uses OpenCL (on the secondary GPU) to create a texture that is later drawn onto the screen using OpenGL (on the main GPU). The application is tied to the processor due to calls to glBindTexture () and glBegin (), both of which spend basically all their time on:
_platform_memmove$VARIANT$Ivybridge
which is part of the video driver:
AMDRadeonX4000GLDriver
Customization: creates an OpenGL texture ( glPixelBuffer ) and then an instance of OpenCL ( clPixelBuffer ).
cl_int clerror = 0; GLuint glPixelBuffer = 0; cl_mem clPixelBuffer = 0; glGenTextures(1, &glPixelBuffer); glBindTexture(GL_TEXTURE_2D, glPixelBuffer); glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 2048, 2048, 0, GL_RGBA, GL_FLOAT, NULL); glBindTexture(GL_TEXTURE_2D, 0); clPixelBuffer = clCreateFromGLTexture(_clShareGroupContext, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, glPixelBuffer, &clerror);
Drawing Code: Maps the OpenGL texture to the viewport. All NSOpenGLView is just one texture.
glClear(GL_COLOR_BUFFER_BIT); glBindTexture(GL_TEXTURE_2D, _glPixelBuffer); // <- spends cpu time here, glBegin(GL_QUADS); // <- and here glTexCoord2f(0., 0.); glVertex3f(-1.f, 1.f, 0.f); glTexCoord2f(0., hr); glVertex3f(-1.f, -1.f, 0.f); glTexCoord2f(wr, hr); glVertex3f( 1.f, -1.f, 0.f); glTexCoord2f(wr, 0.); glVertex3f( 1.f, 1.f, 0.f); glEnd(); glBindTexture(GL_TEXTURE_2D, 0); glFlush();
After gaining control over texture memory (via clEnqueueAcquireGLObjects () ), the OpenCL core writes data to the texture and then releases control (through clEnqueueReleaseGLObjects () ). Texture data should never exist in main memory (if I understand everything correctly).
My question is: was it expected that so much CPU time was spent on memmove ()? Is this an indication of a problem in my code? Maybe a driver error? My (unreasonable) suspicion is that the texture data is moving through: GPUx → CPU / RAM → GPUy, which I would like to avoid.