How to estimate GIF file size?

We are creating an online video editing service. One feature allows users to export a short segment from their video as an animated gif. Imgur has a file size of 2 MB per loaded animated gif.

The size of the Gif file depends on the number of frames, color depth and the image content itself: a solid flat color results in a very light gif, while some random animation of tv noise noise will be quite heavy.

First, I export each video frame as a PNG of the final GIF frame size (fixed, 384x216).

Then, to maximize the quality of the gif, I make several attempts to render the gif with slightly different parameters - changing the number of frames and the number of colors in the gif palette. The reserve, which has the best quality, being under the file size limit, is loaded into Imgur.

Each render requires time and processor resources - I optimize it.

Question: what can be a reasonable way to evaluate the best rendering settings depending on the actual images, bring them as close as possible to the file size limit, and at least minimize the number of rendering attempts to 2-3?

+6
source share
2 answers

GIF image format uses LZW compression. Notorious for the owner of the proprietary Unisys algorithm, which aggressively pays royalties, just like the popular image format. Ok, we got a PNG to thank for this.

The amount by which LZW can compress an image is highly non-deterministic and highly dependent on the content of the image. At best, you can provide the user with a heuristic that estimates the file size of the final image. Display, say, predictions of success with a colored bar. You can quickly color it by converting only the first frame. It doesnโ€™t take much time for a 384x216 image that works in human time, a fraction of a second.

And then extrapolate the effective compression rate of this first image to subsequent frames. Which should only encode small differences from the original frame, so it should have comparable compression ratios.

You really cannot know if it exceeds the site size limit until you have encoded the entire sequence. Therefore, do not forget to emphasize in your interface design that your prediction is just an estimate, so your user will not be too disappointed. And, of course, provide him with tools to reduce the size, something like interpolating the closest neighbor, which makes the pixels in the image larger. Focusing on reducing the size of later frames can also pay off well; GIF encoders usually don't do it well on their own. YMMV.

+7
source

There is no simple answer to this question. The single-frame GIF size mainly depends on the entropy of the image after quantization, and you can try using stddev as an estimate using, for example, ImageMagick:

identify -format "%[fx:standard_deviation]" imagename.png 

You can probably get better results by running a smoothing kernel on the image to eliminate some high-frequency noise, which is unlikely to be informative, and it is very likely to ruin the compression performance. In any case, this is much better with JPEG than with GIF.

Then, in general, you want to run a large number of samples to come up with something like this (let's say you have one Q compression parameter)

 STDDEV SIZE W/Q=1 SIZE W/Q=2 SIZE W/Q=3 ... value1 v1,1 v1,2 v1,3 

After several dozens of tests (but you need to do this only once , and not "at run time"), you will receive both an estimate, say, and a measurement of its error. Then you will see that the image with stddev 0.45, which is compressed to 108 Kb, when Q = 1 will be compressed to 91 Kb plus or minus 5 at Q = 2 and 88 Kb plus or minus 3 at Q = 3, etc.

At this point, you get an unknown image, get its stddev and compression @Q = 1, and you can interpolate the probable size when Q is, say, 4, without actually starting the encoding.

As long as your service is active, you can store statistics (i.e., after you actually perform the encoding, save the actual results) to further improve the assessment; because you just saved some numbers, and not any potentially sensitive or personal information that may be in the video. And the acquisition and storage of these numbers will come almost for free.

Background

It might be useful to recognize images with a fixed background; in this case, you can perform some adaptations to make all frames the same in some areas, and the GIF animation algorithm does not store this information. This, when and if you get such a video (for example, a talking head), can lead to huge savings (but it will completely save you from evaluating the parameters, unless you can also estimate the actual length of the background area. In this case, let this area will be B, let the frame area be A, the compressed size of the โ€œimageโ€ for five frames will be A + (AB) * (5-1) instead of A * 5, and you can apply this correction factor to the estimate).

Compression optimization

Then there are optimization methods that slightly modify the image and adapt it for better compression, but we deviate from the topic at hand. I had several algorithms that worked very well with soldered PNG, which is very similar to GIF, but I had to check if they could be used freely.

Some thoughts: The LZW algorithm continues in rows. Therefore, when a sequence of N pixels โ€œless than X%โ€ differs (perceived or arithmetically) from an already encountered sequence, rewrite the sequence:

  018298765676523456789876543456787654 987678656755234292837683929836567273 

here the sequence 656765234 in the first line almost matches the sequence 656755234 in the second line. By changing mismatch 5-6, the LZW algorithm is likely to pick up the entire sequence and save it with one character instead of three (6567,5,5234) or more.

In addition, LZW works with bits, not bytes. This means that very roughly, the more 0 and 1 are balanced, the worse the compression will be. The more unpredictable their sequence, the worse the results.

So, if we can learn how to make the distribution more attractive **, we will win.

And we can do it , and we can do it without loss (the same thing works with PNG). We select the most common image color as soon as we quantize it. Let this color be the color index 0. This is 00000000, eight zero fats. Now we select the most common color that follows one or the second most common color; and we give it index 1, that is, 00000001. Another seven zeros and one. The following colors will be indexed 2, 4, 8, 16, 32, 64 and 128; each of them has only one bit 1, all the rest are zeros.

Since colors will be very common under the law of power, it is reasonable to assume that about 20% of the pixels will be painted with the first nine most common colors; and that 20% of the data stream can be made at least 87.5% of zeros. Most of them will be consecutive zeros that LZW will not value the end.

Best of all, this intervention is completely lossless; reindexed pixels will still be the same color, it will only be a palette that will be shifted accordingly. I developed such a codec for PNG several years ago, and in my use case (PNG street maps) it gave very good results, ~ 20% compression enhancement. With more varied palettes and the LZW algorithm, the results will probably not be very good, but processing is fast and not so difficult to implement.

+4
source

Source: https://habr.com/ru/post/969997/


All Articles