What is a good metric for comparing vector functions and how to normalize them before comparing?

Background:

I am working on a lower approach to image segmentation, where in the first segment I first intersect the image into small regions / super-pixels / super-voxels, and then I want to iteratively combine neighboring segments with segmentation based on some criteria. One criterion with which I played is to measure how similar the two regions are in appearance. To quantify the appearance of the region, I use several measurements - intensity statistics, texture features, etc. I break down all the functions that I compute for the region into a long feature vector.

Question:

Given two adjacent segmented areas R1 and R2, F1 and F2 are the corresponding feature vectors. My questions are as follows:

- What are good indicators to quantify the similarities between F1 and F2?

- What is the best way to normalize F1 and F2 before quantifying their similarity with the metric? (using any controlled approach to normalization is not possible, because I do not want my algorithm to be bound to one set of images)

The solution in my opinion:

Similarity (R1, R2) = dot_product (F1 / norm (F1), F2 / norm (F2))

In words, I first normalize F1 and F2 as unit vectors, and then use the point product between two vectors as a measure of similarity.

I wonder if there are better ways to normalize them and compare them with the metric. I would be happy if the community can point out some links to me and write down the reasons why something else is better than the similarity measure that I use.

+6
source share
3 answers

Modern image segmentation algorithms use conditional random fields over superpixels (IMO SLIC a> is the best option). This type of algorithm captures the relationship between neighboring superpixels, while at the same time they classify each superpixel (usually using SSVM ).

To classify superpixels, you usually compile a package of functions for each of them, for example, SIFT descriptors , histograms, or any other function that you think may help.

There are many articles describing this process, here you have some of them that interest me:

However, there are not many libraries or software for working with CRF. The best you can find is in this blog post .

+4
source

I am breaking all the functions that I compute for the region into a long function vector. [...]

What are good indicators to quantify the similarities between F1 and F2? [...]

What is the best way to normalize F1 and F2?

tl; dr : use the TF-IDF type of assessment as described here (see Discrete Approach, slides 18-35).


There is a (rather old) CBIR mechanism called GIFT (aka GNU Image Finder) that exactly follows this approach for calculating similarities between images.

What is particularly interesting in GIFT is that it applies methods from text search to CBIR, which has become a classic approach in some ways (see "Text Search Method" Matching Objects in Video ).

In practice, GIFT extracts a large number of local and global color and texture low-level functions, where each individual function (for example, the amount of the ith color in the histogram) can be considered as a visual word:

  • global color (HSV color histogram): 166 bits = 166 visual words
  • local color (analysis of the color histogram by recursively splitting the input image into subregions): 340 (subregions) x 166 (bits) = 56,440 visual words
  • global texture (Gabor histogram): 3 (scales) x 4 (orientations) x 10 (ranges) = 120 visual words
  • local texture (Gabor histogram in the grid of subregions): 256 (subregions) x 120 (bits) = 30 720 visual words

Thus, for any input image, GIFT can extract an 87.446-dimensional vector of functions F , bearing in mind that the attribute is considered either present (with a certain frequency F[i] ) or absent from the image ( F[i] = 0 ).

Then the trick is to first index each image (here each area) into an inverted file for efficient query. In the second stage (request time), you can use each region as a request image.

At the time of the request, the engine uses the classic TF-IDF :

 /* Sum: sum over each visual word i of the query image * TFquery(i): term frequency of visual word i in the query image * TFcandidate(i): term frequency of visual word i in the candidate image * CF(i): collection frequency of visual word i in the indexed database */ score(query, candidate) = Sum [ TFquery(i) * TFcandidate(i) * log**2(1/CF(i)) ] 

Internally, things are a little more complicated since the release:

  • performs subqueries, separately focusing on each of the types of low-level functions (subprocess 1 = color background only, sub-query 2 = color blocks, etc.) and combines estimates,
  • includes cropping functions to evaluate only a certain percentage of functions.

GIFT is pretty effective, so I'm sure you can find interesting ideas there that you could adapt. Of course, you can avoid using an inverted index if you have no performance limitations.

+1
source

I just want to point out that you really do not need to create unit vectors with F1 or F2 before calculating the cosine similarity (this is a point product). This is due to the fact that F1 / norm (F1) will explicitly make each unit vector to compare directions.

Other metrics for vector comparisons include Euclidean distance, Manhattan distance, or Mahalanobis distance. The latter may not be applicable in your scenario. Please read wikipedia for more.

I myself have argued several times that it is better to choose Euclidean or cosine. Please note that the context of using the metric is subjective. If in Euclidean space you just want to measure, if two points are aligned together, the cosine measure makes sense. If you need an explicit distance metric, it is better to Euclidean.

0
source

Source: https://habr.com/ru/post/947761/


All Articles