After going through RTPFragmentize() in h264_encoder_impl I figured it out.
There are several NALUs in the encoded frame. There are various NALUs, including AUD, SPS (67), PPS (68) and IDR. Each NALU is separated by a 4-byte start code, which is 00 00 00 01 .
For OpenH264, the title looked like the first frame
[ 00 00 00 01 67 42 c0 20 8c 8d 40 20 03 09 00 f0
88 46 a0 00 00 00 00 01 68 ce 3c 80] 00 00 00 01 ..
You can see the source code in bold. Only the bytes between the square brackets refer to the header, the last trigger code is for the frame data.
RTPFragmentationHeader for above:
frag_header->fragmentationVectorSize = 3 // 2 fragments for header // 3rd fragment for frame buffer frag_header->fragmentationOffset[0] = 4 frag_header->fragmentationLength[0] = 15 frag_header->fragmentationOffset[1] = 23 // 4 + 15 + sizeof(startcode) frag_header->fragmentationLength[1] = 4 frag_header->fragmentationOffset[2] = 31 frag_header->fragmentationLength[2] = 43218 // last fragment is frame buffer
In the following frames, there was always only one fragment that looked as follows
00 00 00 01 67 b8 .. .. ..
encoded_image->_length is the size of the actual encoded frame buffer and
encoded_image->_size - maximum size of the encoded frame buffer.
The OpenH264 API provides the number of NALUs in the encoded frame that is used to calculate the fragments, while the API I used only the provided header and its size, regardless of whether the header was really added with the frame or not. Searching for frame bytes only for the header size allowed the fragmentation to be correctly calculated.
Doing this finally sent the encoded data, and it was correctly decoded in the client browser.