tl; dr server sprite sheets loaded if necessary.
It can be assumed that the images are created on the server side. YouTube does far more intensive processing on every video than pulling in a few thumbnails. In addition, the fast Google suggests that this feature has existed for several years - probably longer than the required HTML5 / JS / browser power for this client side.
I clicked Download Tools > Resources
in my browser and checked the recently posted video from my channel. Interestingly, there were no impressions so far (the video was only about 20 minutes when I checked). This indicates that the images were probably created on the server side and simply did not complete the processing.
Checking the older video and watching Resources > Images
showed nothing interesting. So I switched to Timelines
and clicked a record, then started working on the timeline and watched network traffic. When I moved the mouse, the *.jpg
files started to load, and they contained 25 thumbnails from this section of the video:

I also noticed that the initial file M0.jpg
is an image of the same size, but contains about 100 thumbnails from the entire video, and not 25 thumbnails from one segment. Example:

Re-testing with the new video seems to download the 100-shaped M0.jpg
and provide basic lower-resolution thumbnails. Then, when you hover over various sections of the video, upload, if necessary, higher resolution M0.jpg
, M1.jpg
, etc.
Interestingly, this will not change for longer videos , which explains why thumbnails can sometimes suck. If your connection or YouTube is too slow to get higher resolution thumbnails, then you are stuck with only 100 low resolution thumbnails from a really long video. Not sure how this works on shorter videos. Also, it may be interesting to see what distribution they get from the thumbnails (is it just linearly every 1 / 100th video or something else).
The last tidbit, I noticed that if you use a URL with a timecode in it, you will not get a full 100-shaped M0.jpg
sheet, but rather a completely different M#.jpg
size containing about 25 low-resolution thumbnails from the timecode to the end of the video.

I assume that they assume that when people refer to a specific time code, users are unlikely to be able to go to an earlier point in the video. In addition, it is less granular than the 75 images you received by sending a regular 100-image M0.jpg
. On the other hand, it is also about 30% of the size, so perhaps speed was so important.
As for creating thumbnails, ffmpeg
is a good way:
To take several screenshots and put them in one image file (creating fragments), you can use the FFmpeg video filter, for example:
ffmpeg -ss 00:00:10 -i movie.avi -frames 1 -vf "select=not(mod(n\,1000)),scale=320:240,tile=2x3" out.png
It takes 10 seconds in the movie, select every 1000th frame, scale it to 320x240 pixels and create 2x3 tiles in the out.png output image, which will look like this:
