Auto Thumbnail Selection for Video

We have all heard that YouTube uses Deep Learning to select a representative thumbnail for the user's video. But has anyone tried this on a tensor flow with success?

I found https://github.com/yahoo/hecate which claims to have done this but was less impressed with the results. I got the best results using ffmpeg to extract keyframes and then calculate the color distribution to select the “best” image.

But I would like to know if someone helps to better use more "intelligent" algorithms.

+4
source share
1 answer

I want to be clear with the OP that this answer does not constitute a formal description of the approach in order to intuitively describe the intended approach.

Suppose that a video consists of n frames and that each of them can be represented as a three-dimensional tensor (height, width, channel). Convolutional neural networks (CNNs) can be used to create a hidden view for each frame.

(f_1, f_2,..., f_n). (RNN). RNN , CNN. (f_1, f_2,..., f_n) , RNN, ( RNN).

Yatube-8M dataset, , , . , , RNN, , c, :

alpha = softmax(FNN(f_1), FNN(f_2), ..., FNN(f_n))
c = f_1 * alpha_1 + f_2 * alpha_2 + ... + f_n * alpha_n

FNN , f_i f_i , , . c, .

, , . , . :

  • : c, , , - ;
  • : c, , , , .

, , , , , . , , OP, , .

+3

Source: https://habr.com/ru/post/1671458/


All Articles