Understanding FFMPEG Video Encoding

Got it from the coding example in ffmpeg. I can follow the authors' example for audio coding somewhat, but I am puzzled looking at the C code (I commented on the block numbers to help me refer to what I say) ...

static void video_encode_example(const char *filename) { AVCodec *codec; AVCodecContext *c= NULL; int i, out_size, size, x, y, outbuf_size; FILE *f; AVFrame *picture; uint8_t *outbuf, *picture_buf; //BLOCK ONE printf("Video encoding\n"); /* find the mpeg1 video encoder */ codec = avcodec_find_encoder(CODEC_ID_MPEG1VIDEO); if (!codec) { fprintf(stderr, "codec not found\n"); exit(1); //BLOCK TWO } c= avcodec_alloc_context(); picture= avcodec_alloc_frame(); /* put sample parameters */ c->bit_rate = 400000; /* resolution must be a multiple of two */ c->width = 352; c->height = 288; /* frames per second */ c->time_base= (AVRational){1,25}; c->gop_size = 10; /* emit one intra frame every ten frames */ c->max_b_frames=1; c->pix_fmt = PIX_FMT_YUV420P; //BLOCK THREE /* open it */ if (avcodec_open(c, codec) < 0) { fprintf(stderr, "could not open codec\n"); exit(1); } f = fopen(filename, "wb"); if (!f) { fprintf(stderr, "could not open %s\n", filename); exit(1); } //BLOCK FOUR /* alloc image and output buffer */ outbuf_size = 100000; outbuf = malloc(outbuf_size); size = c->width * c->height; picture_buf = malloc((size * 3) / 2); /* size for YUV 420 */ picture->data[0] = picture_buf; picture->data[1] = picture->data[0] + size; picture->data[2] = picture->data[1] + size / 4; picture->linesize[0] = c->width; picture->linesize[1] = c->width / 2; picture->linesize[2] = c->width / 2; //BLOCK FIVE /* encode 1 second of video */ for(i=0;i<25;i++) { fflush(stdout); /* prepare a dummy image */ /* Y */ for(y=0;y<c->height;y++) { for(x=0;x<c->width;x++) { picture->data[0][y * picture->linesize[0] + x] = x + y + i * 3; } } //BLOCK SIX /* Cb and Cr */ for(y=0;y<c->height/2;y++) { for(x=0;x<c->width/2;x++) { picture->data[1][y * picture->linesize[1] + x] = 128 + y + i * 2; picture->data[2][y * picture->linesize[2] + x] = 64 + x + i * 5; } } //BLOCK SEVEN /* encode the image */ out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture); printf("encoding frame %3d (size=%5d)\n", i, out_size); fwrite(outbuf, 1, out_size, f); } //BLOCK EIGHT /* get the delayed frames */ for(; out_size; i++) { fflush(stdout); out_size = avcodec_encode_video(c, outbuf, outbuf_size, NULL); printf("write frame %3d (size=%5d)\n", i, out_size); fwrite(outbuf, 1, out_size, f); } //BLOCK NINE /* add sequence end code to have a real mpeg file */ outbuf[0] = 0x00; outbuf[1] = 0x00; outbuf[2] = 0x01; outbuf[3] = 0xb7; fwrite(outbuf, 1, 4, f); fclose(f); free(picture_buf); free(outbuf); avcodec_close(c); av_free(c); av_free(picture); } //BLOCK TEN 

Here's what I can get from code block code to block ...

BLOCK ONE: Initialize variables and pointers. I could not find the AVFrame structure yet in ffmpeg source code, so I don't know what its link is

BLOCK TWO: Uses the codec from the file if it is not found.

BLOCK THREE: Sets sample video parameters. The only thing I don't understand is the size of the gop. I read about the inner frames, and I still do not understand what they are.

BLOCK FOUR: open the file for recording ...

BLOCK FIVE: Here, where they really start to lose me. Part of it is probably due to the fact that I do not know exactly what AVFrame is, but why do they use only 3/2 of the image size?

BLOCK SIX AND SEVEN: I do not understand what they are trying to achieve using this math.

BLOCK EIGHT: It looks like the avcodec function does all the work here until it touches it.

BLOCK NINE: Since it is outside the frame of 25 frames, I assume that it receives the remaining frames?

BLOCK TEN: Close, free mem, etc.

I know this is a big block of code to be confused, any input would be useful. I got a job over my head. Thanks in advance.

+6
source share
2 answers

As HonkyTonk already said, in the comments it says: prepare a dummy image. I assume that you may be confused exactly about how a dummy image is created, especially if you are not familiar with the YUV / YCbCr color space. Read the Wikipedia treatment for the basics .

Many video codecs work in the YUV color space. This often confuses programmers who are only used to work in RGB. The executive summary is that for this option (YUV 4: 2: 0 planar) each pixel in the image gets a sample Y (note that the iteration cycle Y is for each (x, y) pair), and 2x2 pixel squares each share sample U / Cb and sample V / Cr (notice in the seventh seventh that the iteration has a width of / 2 and a height of / 2).

It seems that the generated template is a kind of gradient. If you want to make a known change, set Y / Cb / Cr to 0 and the dummy image will be green. Set Cb and Cr to 128 and set Y to 255 and get a white frame; move Y to 0 to see black; set Y to any value between them, keeping Cb and Cr at 128 to see shades of gray.

+3
source

I share my understanding [Quiet late reply!]

YUV420p:

YUV 420P or YCbCr, is an alternative to RGB reverses and contains 3

namely, the components Y (brightness) U (Y-Cb) and V (Y-Cr). [ans Y-Cb-Cr-Cg =

Constant, we don’t need to store the Cg component because it can usually be calculated.] Just like RGB888, which requires 3 bytes per pixel, YUV420 requires 1.5 bytes per pixel [@Find (How

12 bits are used for what is important in what ratio)] Here P-lines are for Progressive, which means that frames are progressive, that is, V follows U, U follows Y and YUV Frame is an array of bytes, just !! The other is the i-level for interleaving, meaning that UV flat data alternates between Y-plane data in a certain way [@Find (How)]

+4
source

Source: https://habr.com/ru/post/917049/


All Articles