Calculate PTS and DTS correctly for ffmpeg C ++ audio and video synchronization

I am trying to transfer data in H264 format and G711 PCM data to mov multimedia container. I create an AVPacket from encoded data, and initially the value of the PTS and DTS video / audio frames is equivalent to AV_NOPTS_VALUE . Therefore, I calculated DTS using current time information. My code is

 bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) { ..................................... ..................................... ..................................... AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, m_pVideoStream->time_base); int duration = 90000 / VIDEO_FRAME_RATE; if(m_prevVideoDts > 0LL) { duration = dts - m_prevVideoDts; } m_prevVideoDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currVideoDts; m_currVideoDts += duration; pkt.duration = duration; if(bIFrame) { pkt.flags |= AV_PKT_FLAG_KEY; } pkt.stream_index = m_pVideoStream->index; pkt.data = (uint8_t*) pData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing video frame failed."); return false; } Log("Writing video frame done."); av_free_packet(&pkt); return true; } bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) { ................................. ................................. ................................. AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000}); int duration = AUDIO_STREAM_DURATION; // 20 if(m_prevAudioDts > 0LL) { duration = dts - m_prevAudioDts; } m_prevAudioDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currAudioDts; m_currAudioDts += duration; pkt.duration = duration; pkt.stream_index = m_pAudioStream->index; pkt.flags |= AV_PKT_FLAG_KEY; pkt.data = (uint8_t*) pEncodedData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing audio frame failed: %d", ret); return false; } Log("Writing audio frame done."); av_free_packet(&pkt); return true; } 

And I added a stream like this -

 AVStream* AudioVideoRecorder::AddMediaStream(enum AVCodecID codecID) { ................................ ................................. pStream = avformat_new_stream(m_pFormatCtx, codec); if (!pStream) { LogErr("Could not allocate stream."); return NULL; } pStream->id = m_pFormatCtx->nb_streams - 1; pCodecCtx = pStream->codec; pCodecCtx->codec_id = codecID; switch(codec->type) { case AVMEDIA_TYPE_VIDEO: pCodecCtx->bit_rate = VIDEO_BIT_RATE; pCodecCtx->width = PICTURE_WIDTH; pCodecCtx->height = PICTURE_HEIGHT; pStream->time_base = (AVRational){1, 90000}; pStream->avg_frame_rate = (AVRational){90000, 1}; pStream->r_frame_rate = (AVRational){90000, 1}; // though the frame rate is variable and around 15 fps pCodecCtx->pix_fmt = STREAM_PIX_FMT; m_pVideoStream = pStream; break; case AVMEDIA_TYPE_AUDIO: pCodecCtx->sample_fmt = AV_SAMPLE_FMT_S16; pCodecCtx->bit_rate = AUDIO_BIT_RATE; pCodecCtx->sample_rate = AUDIO_SAMPLE_RATE; pCodecCtx->channels = 1; m_pAudioStream = pStream; break; default: break; } /* Some formats want stream headers to be separate. */ if (m_pOutputFmt->flags & AVFMT_GLOBALHEADER) m_pFormatCtx->flags |= CODEC_FLAG_GLOBAL_HEADER; return pStream; } 

There are several problems with this calculation:

  • Video is delayed and lags behind sound over time.

  • Suppose the sound frame is received ( WriteAudio(..) ) a little recently as 3 seconds, then the late frame should be started with a 3 second delay, but this is not so. The delayed frame is played back sequentially with the previous frame.

  • Sometimes I recorded for ~ 40 seconds, but the file duration is very similar to 2 minutes, but the audio / video plays only a few seconds, such as 40 seconds, and the rest of the file contains nothing and jumps right to en right after 40 seconds ( checked in VLC).

EDIT:

As suggested by Ronald S. Bultier, I realized:

 m_pAudioStream->time_base = (AVRational){1, 9000}; // actually no need to set as 9000 is already default value for audio as you said m_pVideoStream->time_base = (AVRational){1, 9000}; 

should be installed, since now both audio and video streams are now in the same base units of time.

And for the video:

 ................... ................... int64_t dts = av_gettime(); // get current time in microseconds dts *= 9000; dts /= 1000000; // 1 second = 10^6 microseconds pkt.pts = AV_NOPTS_VALUE; // is it okay? pkt.dts = dts; // and no need to set pkt.duration, right? 

And for audio: (just like video, right?)

 ................... ................... int64_t dts = av_gettime(); // get current time in microseconds dts *= 9000; dts /= 1000000; // 1 second = 10^6 microseconds pkt.pts = AV_NOPTS_VALUE; // is it okay? pkt.dts = dts; // and no need to set pkt.duration, right? 

And I think that they now look like the same currDts , right? Please correct me if I am mistaken somewhere or something is missing.

Also, if I want to use a temporary video stream base like (AVRational){1, frameRate} and an audio stream time base like (AVRational){1, sampleRate} , what should the correct code look like?

EDIT 2.0:

 m_pAudioStream->time_base = (AVRational){1, VIDEO_FRAME_RATE}; m_pVideoStream->time_base = (AVRational){1, VIDEO_FRAME_RATE}; 

AND

 bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) { ........................... ...................... AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime() / 1000; // convert into millisecond dts = dts * VIDEO_FRAME_RATE; if(m_dtsOffset < 0) { m_dtsOffset = dts; } pkt.pts = AV_NOPTS_VALUE; pkt.dts = (dts - m_dtsOffset); pkt.stream_index = m_pAudioStream->index; pkt.flags |= AV_PKT_FLAG_KEY; pkt.data = (uint8_t*) pEncodedData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing audio frame failed: %d", ret); return false; } Log("Writing audio frame done."); av_free_packet(&pkt); return true; } bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) { ........................................ ................................. AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime() / 1000; dts = dts * VIDEO_FRAME_RATE; if(m_dtsOffset < 0) { m_dtsOffset = dts; } pkt.pts = AV_NOPTS_VALUE; pkt.dts = (dts - m_dtsOffset); if(bIFrame) { pkt.flags |= AV_PKT_FLAG_KEY; } pkt.stream_index = m_pVideoStream->index; pkt.data = (uint8_t*) pData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing video frame failed."); return false; } Log("Writing video frame done."); av_free_packet(&pkt); return true; } 

Is the last change in order? Video and audio seem synchronized. The only problem is that the sound is played without delay, regardless of which packet is delayed. How -

Packet arrival: 1 2 3 4 ... (then the next frame arrived after 3 seconds) .. 5

Listened Audio: 1 2 3 4 (no delay) 5

EDIT 3.0:

zero audio sample data:

 AVFrame* pSilentData; pSilentData = av_frame_alloc(); memset(&pSilentData->data[0], 0, iDataSize); pkt.data = (uint8_t*) pSilentData; pkt.size = iDataSize; av_freep(&pSilentData->data[0]); av_frame_free(&pSilentData); 

This is normal? But after writing this file in the file container during multimedia playback, there is dot dot noise. What is the problem?

EDIT 4.0:

Well, For sound ΞΌ-Law, a null value is represented as 0xff . So -

 memset(&pSilentData->data[0], 0xff, iDataSize); 

solve my problem.

+4
source share
1 answer

Timestamps (e.g. dts ) must be in units of AVStream.time_base. You request a temporary base of video in the 1/90000 format and a base base of audio by default (1/9000), but you use a temporary base of 1/100000 to record dts values. I am also not sure if he guaranteed that the requested time bases are supported during the recording of the header, your multiplexer can change the values ​​and expect that you will deal with the new values.

So, the code is as follows:

 int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000}); int duration = AUDIO_STREAM_DURATION; // 20 if(m_prevAudioDts > 0LL) { duration = dts - m_prevAudioDts; } 

Will not work. Change this to something that uses the timeline of audio sources, and don't set a duration if you don't know what you are doing. (The same goes for the video.)

 m_prevAudioDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currAudioDts; m_currAudioDts += duration; pkt.duration = duration; 

It looks creepy, especially when combined with video code. The problem is that the first packet for both will have a time stamp of zero, regardless of the delay between packets between the streams. You need one parent currDts, used for all threads, otherwise your threads will be constantly synchronized.

[edit]

So, as for your editing, if you have spaces in the sound, I think you need to insert silence (zeroed out audio sample data) for the duration of the break.

+2
source

Source: https://habr.com/ru/post/1257501/


All Articles