WebRTC: What is RTPFragmentationHeader in an encoder implementation?

I modified h264_encoder_impl to use the nvidia grid based hardware encoder. This is done by replacing OpenH264 calls with Nvidia API calls. The encoded stream can be written to the file successfully, but writing _buffer and _size from encoded_image_ not enough, and you also need to fill in the RTPFragmentationHeader .

 // RtpFragmentize(EncodedImage* encoded_image, // std::unique_ptr<uint8_t[]>* encoded_image_buffer, // const VideoFrameBuffer& frame_buffer, // SFrameBSInfo* info, // RTPFragmentationHeader* frag_header) // encode openh264_->Encode(input, &info /*out*/); // fragmentize ? RtpFragmentize(&encoded_image_ /*out*/, &encoded_image_buffer_, *frame_buffer, &info, &frag_header /*out*/); // ... // send encoded_image_callback_->OnEncodedImage(encoded_image_, &codec_specific, &frag_header); 

The current Openh264-based implementation fills frag_header in RTPFragmentize() , and VP8 fills it differently. I see something with NAL untis and layers that also compute encoded_image->_length , but I have no idea how to do this.

I can’t find the documentation anywhere. The implementations of VP8 and OpenH264 are all I have.

So what is an RTPFragmentationHeader ? What does it do? What is encoded_image->_length ? How to fill it correctly when using a custom H264 sensor? I can find the source code, but what next? How to fill in all its members?

0
source share
1 answer

After going through RTPFragmentize() in h264_encoder_impl I figured it out.

There are several NALUs in the encoded frame. There are various NALUs, including AUD, SPS (67), PPS (68) and IDR. Each NALU is separated by a 4-byte start code, which is 00 00 00 01 .

For OpenH264, the title looked like the first frame

  [ 00 00 00 01 67 42 c0 20 8c 8d 40 20 03 09 00 f0  
  88 46 a0 00 00 00 00 01 68 ce 3c 80] 00 00 00 01 .. 

You can see the source code in bold. Only the bytes between the square brackets refer to the header, the last trigger code is for the frame data.

RTPFragmentationHeader for above:

 frag_header->fragmentationVectorSize = 3 // 2 fragments for header // 3rd fragment for frame buffer frag_header->fragmentationOffset[0] = 4 frag_header->fragmentationLength[0] = 15 frag_header->fragmentationOffset[1] = 23 // 4 + 15 + sizeof(startcode) frag_header->fragmentationLength[1] = 4 frag_header->fragmentationOffset[2] = 31 frag_header->fragmentationLength[2] = 43218 // last fragment is frame buffer 

In the following frames, there was always only one fragment that looked as follows

 00 00 00 01 67 b8 .. .. .. 

encoded_image->_length is the size of the actual encoded frame buffer and
encoded_image->_size - maximum size of the encoded frame buffer.

The OpenH264 API provides the number of NALUs in the encoded frame that is used to calculate the fragments, while the API I used only the provided header and its size, regardless of whether the header was really added with the frame or not. Searching for frame bytes only for the header size allowed the fragmentation to be correctly calculated.

Doing this finally sent the encoded data, and it was correctly decoded in the client browser.

+1
source

Source: https://habr.com/ru/post/1269991/


All Articles