I am trying to transfer data in H264 format and G711 PCM data to mov multimedia container. I create an AVPacket from encoded data, and initially the value of the PTS and DTS video / audio frames is equivalent to AV_NOPTS_VALUE . Therefore, I calculated DTS using current time information. My code is
bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) { ..................................... ..................................... ..................................... AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, m_pVideoStream->time_base); int duration = 90000 / VIDEO_FRAME_RATE; if(m_prevVideoDts > 0LL) { duration = dts - m_prevVideoDts; } m_prevVideoDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currVideoDts; m_currVideoDts += duration; pkt.duration = duration; if(bIFrame) { pkt.flags |= AV_PKT_FLAG_KEY; } pkt.stream_index = m_pVideoStream->index; pkt.data = (uint8_t*) pData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing video frame failed."); return false; } Log("Writing video frame done."); av_free_packet(&pkt); return true; } bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) { ................................. ................................. ................................. AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000}); int duration = AUDIO_STREAM_DURATION;
And I added a stream like this -
AVStream* AudioVideoRecorder::AddMediaStream(enum AVCodecID codecID) { ................................ ................................. pStream = avformat_new_stream(m_pFormatCtx, codec); if (!pStream) { LogErr("Could not allocate stream."); return NULL; } pStream->id = m_pFormatCtx->nb_streams - 1; pCodecCtx = pStream->codec; pCodecCtx->codec_id = codecID; switch(codec->type) { case AVMEDIA_TYPE_VIDEO: pCodecCtx->bit_rate = VIDEO_BIT_RATE; pCodecCtx->width = PICTURE_WIDTH; pCodecCtx->height = PICTURE_HEIGHT; pStream->time_base = (AVRational){1, 90000}; pStream->avg_frame_rate = (AVRational){90000, 1}; pStream->r_frame_rate = (AVRational){90000, 1};
There are several problems with this calculation:
Video is delayed and lags behind sound over time.
Suppose the sound frame is received ( WriteAudio(..) ) a little recently as 3 seconds, then the late frame should be started with a 3 second delay, but this is not so. The delayed frame is played back sequentially with the previous frame.
Sometimes I recorded for ~ 40 seconds, but the file duration is very similar to 2 minutes, but the audio / video plays only a few seconds, such as 40 seconds, and the rest of the file contains nothing and jumps right to en right after 40 seconds ( checked in VLC).
EDIT:
As suggested by Ronald S. Bultier, I realized:
m_pAudioStream->time_base = (AVRational){1, 9000}; // actually no need to set as 9000 is already default value for audio as you said m_pVideoStream->time_base = (AVRational){1, 9000};
should be installed, since now both audio and video streams are now in the same base units of time.
And for the video:
................... ................... int64_t dts = av_gettime(); // get current time in microseconds dts *= 9000; dts /= 1000000; // 1 second = 10^6 microseconds pkt.pts = AV_NOPTS_VALUE; // is it okay? pkt.dts = dts; // and no need to set pkt.duration, right?
And for audio: (just like video, right?)
................... ................... int64_t dts = av_gettime(); // get current time in microseconds dts *= 9000; dts /= 1000000; // 1 second = 10^6 microseconds pkt.pts = AV_NOPTS_VALUE; // is it okay? pkt.dts = dts; // and no need to set pkt.duration, right?
And I think that they now look like the same currDts , right? Please correct me if I am mistaken somewhere or something is missing.
Also, if I want to use a temporary video stream base like (AVRational){1, frameRate} and an audio stream time base like (AVRational){1, sampleRate} , what should the correct code look like?
EDIT 2.0:
m_pAudioStream->time_base = (AVRational){1, VIDEO_FRAME_RATE}; m_pVideoStream->time_base = (AVRational){1, VIDEO_FRAME_RATE};
AND
bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) { ........................... ...................... AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime() / 1000; // convert into millisecond dts = dts * VIDEO_FRAME_RATE; if(m_dtsOffset < 0) { m_dtsOffset = dts; } pkt.pts = AV_NOPTS_VALUE; pkt.dts = (dts - m_dtsOffset); pkt.stream_index = m_pAudioStream->index; pkt.flags |= AV_PKT_FLAG_KEY; pkt.data = (uint8_t*) pEncodedData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing audio frame failed: %d", ret); return false; } Log("Writing audio frame done."); av_free_packet(&pkt); return true; } bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) { ........................................ ................................. AVPacket pkt = {0}; av_init_packet(&pkt); int64_t dts = av_gettime() / 1000; dts = dts * VIDEO_FRAME_RATE; if(m_dtsOffset < 0) { m_dtsOffset = dts; } pkt.pts = AV_NOPTS_VALUE; pkt.dts = (dts - m_dtsOffset); if(bIFrame) { pkt.flags |= AV_PKT_FLAG_KEY; } pkt.stream_index = m_pVideoStream->index; pkt.data = (uint8_t*) pData; pkt.size = iDataSize; int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt); if(ret < 0) { LogErr("Writing video frame failed."); return false; } Log("Writing video frame done."); av_free_packet(&pkt); return true; }
Is the last change in order? Video and audio seem synchronized. The only problem is that the sound is played without delay, regardless of which packet is delayed. How -
Packet arrival: 1 2 3 4 ... (then the next frame arrived after 3 seconds) .. 5
Listened Audio: 1 2 3 4 (no delay) 5
EDIT 3.0:
zero audio sample data:
AVFrame* pSilentData; pSilentData = av_frame_alloc(); memset(&pSilentData->data[0], 0, iDataSize); pkt.data = (uint8_t*) pSilentData; pkt.size = iDataSize; av_freep(&pSilentData->data[0]); av_frame_free(&pSilentData);
This is normal? But after writing this file in the file container during multimedia playback, there is dot dot noise. What is the problem?
EDIT 4.0:
Well, For sound ΞΌ-Law, a null value is represented as 0xff . So -
memset(&pSilentData->data[0], 0xff, iDataSize);
solve my problem.