I am writing a program for streaming live audio and video from a webcam to an rtmp server. I work on MacOS X 10.8, so I use the AVFoundation framework to receive audio and video frames from input devices. These frames are included in the delegate:
-(void) captureOutput:(AVCaptureOutput*)captureOutput didOutputSampleBuffer: (CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection*)connection ,
where sampleBuffer contains audio or video data.
When I receive audio data in sampleBuffer , I try to convert this data to AVFrame and encode AVFrame with libavcodec:
aframe = avcodec_alloc_frame(); //AVFrame *aframe; int got_packet, ret; CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer); //CMSampleBufferRef NSUInteger channelIndex = 0; CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer); size_t audioBlockBufferOffset = (channelIndex * numSamples * sizeof(SInt16)); size_t lengthAtOffset = 0; size_t totalLength = 0; SInt16 *samples = NULL; CMBlockBufferGetDataPointer(audioBlockBuffer, audioBlockBufferOffset, &lengthAtOffset, &totalLength, (char **)(&samples)); const AudioStreamBasicDescription *audioDescription = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(sampleBuffer)); aframe->nb_samples =(int) numSamples; aframe->channels=audioDescription->mChannelsPerFrame; aframe->sample_rate=(int)audioDescription->mSampleRate; //my webCamera configured to produce 16bit 16kHz LPCM mono, so sample format hardcoded here, and seems to be correct avcodec_fill_audio_frame(aframe, aframe->channels, AV_SAMPLE_FMT_S16, (uint8_t *)samples, aframe->nb_samples * av_get_bytes_per_sample(AV_SAMPLE_FMT_S16) * aframe->channels, 0); //encoding audio ret = avcodec_encode_audio2(c, &pkt, aframe, &got_packet); if (ret < 0) { fprintf(stderr, "Error encoding audio frame: %s\n", av_err2str(ret)); exit(1); }
code>
The problem is that when I get the formed frames, I hear the desired sound, but it slows down and breaks off (as if the same silence frame appears after each data frame). There seems to be something wrong with the conversion from CMSampleBuffer to AVFrame , because the preview from the microphone created using AVFoundation is from the same sample buffers that play normally.
I would be grateful for your help.
UPD: Creating and initializing an AVCodceContext structure
audio_codec= avcodec_find_encoder(AV_CODEC_ID_AAC); if (!(audio_codec)) { fprintf(stderr, "Could not find encoder for '%s'\n", avcodec_get_name(AV_CODEC_ID_AAC)); exit(1); } audio_st = avformat_new_stream(oc, audio_codec); //AVFormatContext *oc; if (!audio_st) { fprintf(stderr, "Could not allocate stream\n"); exit(1); } audio_st->id=1; audio_st->codec->sample_fmt= AV_SAMPLE_FMT_S16; audio_st->codec->bit_rate = 64000; audio_st->codec->sample_rate= 16000; audio_st->codec->channels=1; audio_st->codec->codec_type= AVMEDIA_TYPE_AUDIO;
code>