What is the theory of choosing one element based on various criteria?

I need to solve a problem when element A should be compared with thousands of other elements and find out which elements are most similar to element A.

I want to assign weight to each of these elements depending on how similar they are to element A. Various criteria determine the final weight. For example, if item1.someProperty == otherItem.someProperty , then I increase the weight by 5, if item1.anotherProperty == otherItem.anotherProperty , then I increase the weight only by 1, because someProperty more important than anotherProperty .

The reason I describe all of this is because I want to know if there is any theory that will help me create this system. In particular, how to choose the weight of each criterion, how to calculate the final weight of an element and how to do it all.

Does anyone know if there is any theory that could help? Or maybe there is a better way to do what I'm trying to do?

+4
source share
3 answers

You can imagine your properties as dimensions and make a distance from them. If there is a correlation between the properties, you can also take this into account (google for Mahalanobis distance).

But basically it comes down to

  float distance(a, b) { return w1 * ABS(ax - bx) + w2 * ABS(ay - by) ... ; } 

Instead of summarizing the terms, you could sum the square terms (to fine the big differences), everything goes.

BTW for nominal data you could use a certain degree of difference based on entropy.

+2
source

At least in appearance, it looks like a vector space (VSM) for information retrieval (IR). This is usually based on word bags, but it can be adapted to other representations of the data.

The scales you describe will correspond to what is called "field gain" in the IR VSM.

But see also the search for the nearest neighbor .

+2
source

You can read any book related to machine learning, for example this one . The KNN algorithm (K is the closest neighbor) addresses your problem. You should basically determine the measure of distance by your problem, and then compare these distances.

+2
source

Source: https://habr.com/ru/post/1386339/


All Articles