Calculate PTS and DTS correctly for ffmpeg C ++ audio and video synchronization

Question

Calculate PTS and DTS correctly for ffmpeg C ++ audio and video synchronization

I am trying to transfer data in H264 format and G711 PCM data to mov multimedia container. I create an AVPacket from encoded data, and initially the value of the PTS and DTS video / audio frames is equivalent to AV_NOPTS_VALUE . Therefore, I calculated DTS using current time information. My code is

 bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) { ..................................... ..................................... ..................................... AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, m_pVideoStream->time_base); int duration = 90000 / VIDEO_FRAME_RATE; if(m_prevVideoDts > 0LL) { duration = dts - m_prevVideoDts; } m_prevVideoDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currVideoDts; m_currVideoDts += duration; pkt.duration = duration; if(bIFrame) { pkt.flags |= AV_PKT_FLAG_KEY; } pkt.stream_index = m_pVideoStream->index; pkt.data = (uint8_t*) pData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing video frame failed."); return false; } Log("Writing video frame done."); av_free_packet(&pkt); return true; } bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) { ................................. ................................. ................................. AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000}); int duration = AUDIO_STREAM_DURATION; // 20 if(m_prevAudioDts > 0LL) { duration = dts - m_prevAudioDts; } m_prevAudioDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currAudioDts; m_currAudioDts += duration; pkt.duration = duration; pkt.stream_index = m_pAudioStream->index; pkt.flags |= AV_PKT_FLAG_KEY; pkt.data = (uint8_t*) pEncodedData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing audio frame failed: %d", ret); return false; } Log("Writing audio frame done."); av_free_packet(&pkt); return true; }

And I added a stream like this -

 AVStream* AudioVideoRecorder::AddMediaStream(enum AVCodecID codecID) { ................................ ................................. pStream = avformat_new_stream(m_pFormatCtx, codec); if (!pStream) { LogErr("Could not allocate stream."); return NULL; } pStream->id = m_pFormatCtx->nb_streams - 1; pCodecCtx = pStream->codec; pCodecCtx->codec_id = codecID; switch(codec->type) { case AVMEDIA_TYPE_VIDEO: pCodecCtx->bit_rate = VIDEO_BIT_RATE; pCodecCtx->width = PICTURE_WIDTH; pCodecCtx->height = PICTURE_HEIGHT; pStream->time_base = (AVRational){1, 90000}; pStream->avg_frame_rate = (AVRational){90000, 1}; pStream->r_frame_rate = (AVRational){90000, 1}; // though the frame rate is variable and around 15 fps pCodecCtx->pix_fmt = STREAM_PIX_FMT; m_pVideoStream = pStream; break; case AVMEDIA_TYPE_AUDIO: pCodecCtx->sample_fmt = AV_SAMPLE_FMT_S16; pCodecCtx->bit_rate = AUDIO_BIT_RATE; pCodecCtx->sample_rate = AUDIO_SAMPLE_RATE; pCodecCtx->channels = 1; m_pAudioStream = pStream; break; default: break; } /* Some formats want stream headers to be separate. */ if (m_pOutputFmt->flags & AVFMT_GLOBALHEADER) m_pFormatCtx->flags |= CODEC_FLAG_GLOBAL_HEADER; return pStream; }

There are several problems with this calculation:

Video is delayed and lags behind sound over time.
Suppose the sound frame is received ( WriteAudio(..) ) a little recently as 3 seconds, then the late frame should be started with a 3 second delay, but this is not so. The delayed frame is played back sequentially with the previous frame.
Sometimes I recorded for ~ 40 seconds, but the file duration is very similar to 2 minutes, but the audio / video plays only a few seconds, such as 40 seconds, and the rest of the file contains nothing and jumps right to en right after 40 seconds ( checked in VLC).

EDIT:

As suggested by Ronald S. Bultier, I realized:

 m_pAudioStream->time_base = (AVRational){1, 9000}; // actually no need to set as 9000 is already default value for audio as you said m_pVideoStream->time_base = (AVRational){1, 9000};

should be installed, since now both audio and video streams are now in the same base units of time.

And for the video:

 ................... ................... int64_t dts = av_gettime(); // get current time in microseconds dts *= 9000; dts /= 1000000; // 1 second = 10^6 microseconds pkt.pts = AV_NOPTS_VALUE; // is it okay? pkt.dts = dts; // and no need to set pkt.duration, right?

And for audio: (just like video, right?)

 ................... ................... int64_t dts = av_gettime(); // get current time in microseconds dts *= 9000; dts /= 1000000; // 1 second = 10^6 microseconds pkt.pts = AV_NOPTS_VALUE; // is it okay? pkt.dts = dts; // and no need to set pkt.duration, right?

And I think that they now look like the same currDts , right? Please correct me if I am mistaken somewhere or something is missing.

Also, if I want to use a temporary video stream base like (AVRational){1, frameRate} and an audio stream time base like (AVRational){1, sampleRate} , what should the correct code look like?

EDIT 2.0:

 m_pAudioStream->time_base = (AVRational){1, VIDEO_FRAME_RATE}; m_pVideoStream->time_base = (AVRational){1, VIDEO_FRAME_RATE};

AND

 bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) { ........................... ...................... AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime() / 1000; // convert into millisecond dts = dts * VIDEO_FRAME_RATE; if(m_dtsOffset < 0) { m_dtsOffset = dts; } pkt.pts = AV_NOPTS_VALUE; pkt.dts = (dts - m_dtsOffset); pkt.stream_index = m_pAudioStream->index; pkt.flags |= AV_PKT_FLAG_KEY; pkt.data = (uint8_t*) pEncodedData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing audio frame failed: %d", ret); return false; } Log("Writing audio frame done."); av_free_packet(&pkt); return true; } bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) { ........................................ ................................. AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime() / 1000; dts = dts * VIDEO_FRAME_RATE; if(m_dtsOffset < 0) { m_dtsOffset = dts; } pkt.pts = AV_NOPTS_VALUE; pkt.dts = (dts - m_dtsOffset); if(bIFrame) { pkt.flags |= AV_PKT_FLAG_KEY; } pkt.stream_index = m_pVideoStream->index; pkt.data = (uint8_t*) pData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing video frame failed."); return false; } Log("Writing video frame done."); av_free_packet(&pkt); return true; }

Is the last change in order? Video and audio seem synchronized. The only problem is that the sound is played without delay, regardless of which packet is delayed. How -

Packet arrival: 1 2 3 4 ... (then the next frame arrived after 3 seconds) .. 5

Listened Audio: 1 2 3 4 (no delay) 5

EDIT 3.0:

zero audio sample data:

 AVFrame* pSilentData; pSilentData = av_frame_alloc(); memset(&pSilentData->data[0], 0, iDataSize); pkt.data = (uint8_t*) pSilentData; pkt.size = iDataSize; av_freep(&pSilentData->data[0]); av_frame_free(&pSilentData);

This is normal? But after writing this file in the file container during multimedia playback, there is dot dot noise. What is the problem?

EDIT 4.0:

Well, For sound μ-Law, a null value is represented as 0xff . So -

 memset(&pSilentData->data[0], 0xff, iDataSize);

solve my problem.

+4

c ++ ffmpeg video audio

Kaidul Islam Aug 12 '15 at 18:46

source share

1 answer

Ronald S. Bultje · Accepted Answer · 2015-08-12T20:36:32+0000

Timestamps (e.g. dts ) must be in units of AVStream.time_base. You request a temporary base of video in the 1/90000 format and a base base of audio by default (1/9000), but you use a temporary base of 1/100000 to record dts values. I am also not sure if he guaranteed that the requested time bases are supported during the recording of the header, your multiplexer can change the values and expect that you will deal with the new values.

So, the code is as follows:

 int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000}); int duration = AUDIO_STREAM_DURATION; // 20 if(m_prevAudioDts > 0LL) { duration = dts - m_prevAudioDts; }

Will not work. Change this to something that uses the timeline of audio sources, and don't set a duration if you don't know what you are doing. (The same goes for the video.)

 m_prevAudioDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currAudioDts; m_currAudioDts += duration; pkt.duration = duration;

It looks creepy, especially when combined with video code. The problem is that the first packet for both will have a time stamp of zero, regardless of the delay between packets between the streams. You need one parent currDts, used for all threads, otherwise your threads will be constantly synchronized.

[edit]

So, as for your editing, if you have spaces in the sound, I think you need to insert silence (zeroed out audio sample data) for the duration of the break.

Calculate PTS and DTS correctly for ffmpeg C ++ audio and video synchronization

More articles: