Faster real-time 3D graphics encoding with opengl and x264

I am working on a system that sends compressed video to a client from 3D graphics that run on the server as soon as they are displayed. The code already works for me, but I feel that it can be much faster (and this is already a bottleneck in the system)

That's what I'm doing:

First i take the framebuffer

glReadBuffer( GL_FRONT ); glReadPixels( 0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, buffer ); 

Then I flip the framebuffer because there is a weird bug with swsScale (which I use to convert the color of the space) that flips the image vertically when I convert. I redeploy in advance, nothing fantastic.

 void VerticalFlip(int width, int height, byte* pixelData, int bitsPerPixel) { byte* temp = new byte[width*bitsPerPixel]; height--; //remember height array ends at height-1 for (int y = 0; y < (height+1)/2; y++) { memcpy(temp,&pixelData[y*width*bitsPerPixel],width*bitsPerPixel); memcpy(&pixelData[y*width*bitsPerPixel],&pixelData[(height-y)*width*bitsPerPixel],width*bitsPerPixel); memcpy(&pixelData[(height-y)*width*bitsPerPixel],temp,width*bitsPerPixel); } delete[] temp; } 

Then I convert it to YUV420p

 convertCtx = sws_getContext(width, height, PIX_FMT_RGB24, width, height, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); uint8_t *src[3]= {buffer, NULL, NULL}; sws_scale(convertCtx, src, &srcstride, 0, height, pic_in.img.plane, pic_in.img.i_stride); 

Then I pretty much just call the x264 encoder. I already use a preliminary set of null values.

 int frame_size = x264_encoder_encode(_encoder, &nals, &i_nals, _inputPicture, &pic_out); 

I suppose there should be a faster way to do this. Capture frame and convert it to YUV420p. It would be nice to convert it to YUV420p in the GPU and only then copy it to the system memory, and I hope there is a way to do the color conversion without having to flip it.

If there is no better way, at least this question can help someone who is trying to do this, do it the same way I do.

+4
source share
1 answer

First use an asynchronous texture using PBO. Here is an example. This speeds up reading using 2 PBOs that work asynchronously, without stopping like readPixels when used directly. In my application, I got 80% performance improvement when switching to PBO. Also, on some GPUs, glGetTexImage () is faster than glReadPixels (), so give it a try.

But if you really want to take video encoding to the next level, you can do it through CUDA using the Nvidia Codec Library . I recently asked the same question, so this one might be helpful.

+2
source

Source: https://habr.com/ru/post/1437641/


All Articles