I am breaking all the functions that I compute for the region into a long function vector. [...]
What are good indicators to quantify the similarities between F1 and F2? [...]
What is the best way to normalize F1 and F2?
tl; dr : use the TF-IDF type of assessment as described here (see Discrete Approach, slides 18-35).
There is a (rather old) CBIR mechanism called GIFT (aka GNU Image Finder) that exactly follows this approach for calculating similarities between images.
What is particularly interesting in GIFT is that it applies methods from text search to CBIR, which has become a classic approach in some ways (see "Text Search Method" Matching Objects in Video ).
In practice, GIFT extracts a large number of local and global color and texture low-level functions, where each individual function (for example, the amount of the ith color in the histogram) can be considered as a visual word:
- global color (HSV color histogram): 166 bits = 166 visual words
- local color (analysis of the color histogram by recursively splitting the input image into subregions): 340 (subregions) x 166 (bits) = 56,440 visual words
- global texture (Gabor histogram): 3 (scales) x 4 (orientations) x 10 (ranges) = 120 visual words
- local texture (Gabor histogram in the grid of subregions): 256 (subregions) x 120 (bits) = 30 720 visual words
Thus, for any input image, GIFT can extract an 87.446-dimensional vector of functions F , bearing in mind that the attribute is considered either present (with a certain frequency F[i] ) or absent from the image ( F[i] = 0 ).
Then the trick is to first index each image (here each area) into an inverted file for efficient query. In the second stage (request time), you can use each region as a request image.
At the time of the request, the engine uses the classic TF-IDF :
score(query, candidate) = Sum [ TFquery(i) * TFcandidate(i) * log**2(1/CF(i)) ]
Internally, things are a little more complicated since the release:
- performs subqueries, separately focusing on each of the types of low-level functions (subprocess 1 = color background only, sub-query 2 = color blocks, etc.) and combines estimates,
- includes cropping functions to evaluate only a certain percentage of functions.
GIFT is pretty effective, so I'm sure you can find interesting ideas there that you could adapt. Of course, you can avoid using an inverted index if you have no performance limitations.