I added a new GL rendering engine to my engine, which uses the main profile. Although it works fine on Windows and / or nvidia cards , it looks like 10 times slower on OS X (3 frames per second instead of 30). It is strange that my rendering of the compatibility profile works fine.
I collected some tracks using the tools and the GL profiler:
https://www.dropbox.com/sh/311fg9wu0zrarzm/31CGvUcf2q
This shows that the application is wasting its time in glDrawRangeElements. I have tried the following things:
- Use glDrawElements instead (no effect)
- dropping dropping (does not affect speed)
- disable some GL_DYNAMIC_DRAW buffers (no effect)
- bind index buffer after VAO when drawing (no effect)
- converted indexes to 4 bytes (no effect)
- use GL_BGRA textures (no effect)
What I have not tried is to align my vertices with a 16-byte boundary and / or convert indices to 4 bytes , but seriously, if that is a problem, then why the hell should the standard be allowed?
I create a context like this:
NSOpenGLPixelFormatAttribute attributes[] = { NSOpenGLPFAColorSize, 24, NSOpenGLPFAAlphaSize, 8, NSOpenGLPFADepthSize, 24, NSOpenGLPFAStencilSize, 8, NSOpenGLPFADoubleBuffer, NSOpenGLPFAAccelerated, NSOpenGLPFANoRecovery, NSOpenGLPFAOpenGLProfile, NSOpenGLProfileVersion3_2Core, 0 }; NSOpenGLPixelFormat* format = [[NSOpenGLPixelFormat alloc] initWithAttributes:attributes]; NSOpenGLContext* context = [[NSOpenGLContext alloc] initWithFormat:format shareContext:nil]; [self.view setOpenGLContext:context]; [context makeCurrentContext];
Tried the following specifications:
- radeon 6630M, OS X 10.7.5
- radeon 6750M, OS X 10.7.5
- geforce GT 330M, OS X 10.8.3
Do you have any idea what I can do wrong? Again, it works great with the compatibility profile (but doesn't use VAO).
UPDATE . Reported by Apple.
UPDATE : Apple doesnβt give a damn about the problem ... anyway, I created a small test program that is actually good. Now I compared the call stack with the tools and found out that when using the engine, glDrawRangeElements makes two calls:
- gleDrawArraysOrElements_ExecCore
- gleDrawArraysOrElements_Entries_Body
whereas in the test program it only calls the second. Now the first call does something like immediate mode rendering (gleFlushPrimitivesTCLFunc, gleRunVertexSubmitterImmediate), so slowing down is obvious.