SQL - choosing the most similar product

Ok, I have a relationship that stores two keys, a product identifier and an attribute identifier. I want to find out which product is most similar to this product. (Attributes are actually numbers, but this makes the example more confusing, so they were changed to letters to simplify the visual presentation.)

Prod_att

Product | Attributes 1 | A 1 | B 1 | C 2 | A 2 | B 2 | D 3 | A 3 | E 4 | A 

This initially seems pretty straightforward, just select the attributes that the product has, and then count the number of attributes per total product. The result of this is then compared with the number of attributes that the product has, and I see how the two products are similar. This works for products with more attributes than their compared products, but problems arise when products have very few attributes. For example, product 3 will have a tie for almost any other product (since A is very common).

 SELECT Product, count(Attributes) FROM Prod_att WHERE Attributes IN (SELECT Attributes FROM prod_att WHERE Product = 1) GROUP BY Product ; 

Any suggestions to fix this or improve my current request?
Thanks!

* edit: Product 4 will return count () = 1 for all products. I would like to show that product 3 is more similar as it has fewer differences.

+4
source share
3 answers

try it

 SELECT a_product_id, COALESCE( b_product_id, 'no_matchs_found' ) AS closest_product_match FROM ( SELECT *, @row_num := IF(@prev_value=A_product_id,@row_num+1,1) AS row_num, @prev_value := a_product_id FROM (SELECT @prev_value := 0) r JOIN ( SELECT a.product_id as a_product_id, b.product_id as b_product_id, count( distinct b.Attributes ), count( distinct b2.Attributes ) as total_products FROM products a LEFT JOIN products b ON ( a.Attributes = b.Attributes AND a.product_id <> b.product_id ) LEFT JOIN products b2 ON ( b2.product_id = b.product_id ) /*WHERE */ /* a.product_id = 3 */ GROUP BY a.product_id, b.product_id ORDER BY 1, 3 desc, 4 ) t ) t2 WHERE row_num = 1 

The above query gets closest matches for all products, you can include product_id in the innermost query to get results for a specific product_id , I used LEFT JOIN so that even if a product has no matches, it displays

SQLFIDDLE

Hope this helps

+2
source

Try the "Lower Confidence Interval Bound for Wilson Count for Bernoulli Option . " This clearly addresses the issue of statistical certainty when you have a little n. This is similar to mathematics, but in fact it is about the minimum amount of mathematics that you need for the correct solution. And the site explains it pretty well.

This suggests that you can take a step from a positive / negative count to your attribute match / mismatch problem.

Here is an example of positive and negative scoring and 95% CL:

 SELECT widget_id, ((positive + 1.9208) / (positive + negative) - 1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) / (positive + negative)) / (1 + 3.8416 / (positive + negative)) AS ci_lower_bound FROM widgets WHERE positive + negative > 0 ORDER BY ci_lower_bound DESC; 
0
source

You can write a little presentation that will give you common common attributes between the two products.

 create view vw_shared_attributes as select a.product, b.product 'product_match', count(*) 'shared_attributes' from your_table a inner join test b on b.attribute = a.attribute and b.product <> a.product group by a.product, b.product 

and then use this view to select the top match.

  select product, (select top 1 s.product_match from vw_shared_attributes s where t.product = s.product order by s.shared_attributes desc) from your_table t group by product 

See http://www.sqlfiddle.com/#!6/53039/1 for an example

0
source

Source: https://habr.com/ru/post/1479785/


All Articles