MySQL, what would be the best ranking approach with the smallest possible match?

I have a MySQL database that I am looking for. Let's say this is a database of people. When you request a specific record, you can find a 100% match for each attribute. But querying the database to find the closest probability match (the closest match for the table attributes) is more of a strategy.

In this scenario, does it make sense to create a temporary table (like a table sheet) to indicate which attributes correspond / which attributes are present? What is a typical approach to conducting advanced database searches?

Example (below) of a hypothetical stored procedure

Parameters

* - This is just an example of how I will search. I am not worried about how to perform my selections. Question about approach, strategy, technique *

call FindPerson ("Brown Eyes", "Brown hair", "Height:6'1", "white", "Name:Joe" ,"weight180", "Age 34" "sex m"); RESULT TABLE NAME AGE HEIGHT WEIGHT HAIR SKIN sex RANK_MATCH Joe 32 6'1 180 Brown white m 1 Mike 33 6'1 179 Brown white m 2 James 31 6'0 179 Brown black m 3 
+4
source share
2 answers

Just out of my head. You can create your own account and sort it. Sort of

 SELECT `id`, (IF(`age`=32,1,0)+IF(`height`="6'1",1,0)+...) as `score` FROM `people` HAVING `score` > 0 ORDER BY `score` DESC LIMIT 10; 

With this, you can process each field with its own comparison, as well as weigh individual attributes, and not just add 1 , but 2 or more. But I'm not sure how good this is.

+2
source

The approach I would use would be to create a scoring function (your stored process) that would evaluate a given standard input distance from the average.

In proc, you will evaluate each criterion in the same way as:

 INPUT AGE: 32 calculate MEAN of AGE WHERE (sex = m): 34.5 calculate STANDARD DEVIATION of AGE WHERE (sex = m): 2.5 calculate how many STDEVs 32 is from the 34.5 (also known as z-score): 1 

Repeat this process for all numeric data types, adding them up and ORDER BY the sum.

In this case, the following scheme change will be required: the height has changed from the height of the foot / inch to strictly in inches.

Depending on your needs, you can also think of coming up with a custom scale for gender and skin color / hair color. Of course, you might think that such measures should not be taken into account because of how radically this would change the scoring function. If you decide, you will need to find some number that will be added to the above SUM ... but this is difficult, because nominative variables are not easily translated into such things.

If you find that hair color / skin color can be conveniently conveyed, say, by a continuous color spectrum, your highlight tidbit will be the same ... the color value of the input and the color value of the average values ​​and standard deviations.

A query that finds your matches will be something like:

 SELECT ABS(INPUT_AGE - AVG(AGE)) / STD(AGE) AS age_z, ABS(INPUT_WT - AVG(WT)) / STD(WT) AS wt_z, ... (age_z + wt_z + ...) AS score FROM `table` ORDER BY score ASC 
+2
source

Source: https://habr.com/ru/post/1396903/


All Articles