From one to a large relationship between table columns. Group and search for combinations

In the example table t0:

OrderID | ProductID
 0001      1254
 0001      1252
 0002      0038
 0003      1254
 0003      1252
 0003      1432
 0004      0038
 0004      1254
 0004      1252  

I need to find the most popular combination of two product identifiers under one OrderID. The goal is to decide which products are most likely to be sold together in the same order, for example, a telephone - speakerphone. I think the logic is to group by OrderID, calculate every possible combination of productID pairs, count them by OrderID and select TOP 2, but I really can’t say if this is feasible.

+4
source share
4 answers

"", , , , "" . :

CREATE TABLE OrderDetail
    ([OrderID] int, [ProductID] int)
;

INSERT INTO OrderDetail
    ([OrderID], [ProductID])
VALUES
    (0001, 1254), (0001, 1252), (0002, 0038), (0003, 1254), (0003, 1252), (0003, 1432), (0004, 0038), (0004, 1254), (0004, 1252)
;

1:

select -- top(2)
      od1.ProductID, od2.ProductID, count(*) count_of
from OrderDetail od1
inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
group by
      od1.ProductID, od2.ProductID
order by
      count_of DESC

:

| ProductID | ProductID | count_of |
|-----------|-----------|----------|
|      1252 |      1254 |        3 |
|      1252 |      1432 |        1 |
|      1254 |      1432 |        1 |
|        38 |      1252 |        1 |
|        38 |      1254 |        1 |

----

" 2" . , , dense_rank(), "" , . / , .

with ProductPairs as (
      select 
             p1, p2, count_pair
          , dense_rank() over(order by count_pair DESC) as ranked
      from (
            select
                  od1.ProductID p1, od2.ProductID p2, count(*) count_pair
            from OrderDetail od1
            inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
            group by
                  od1.ProductID, od2.ProductID
            ) d
      )
, RankedProducts as (
       select p1 as ProductID, ranked, count_pair
       from ProductPairs
       union all
       select p2 as ProductID, ranked, count_pair
       from ProductPairs
       )
select *
from RankedProducts
where ranked <= 2
order by ranked, ProductID
+2

:

SELECT T1.orderID,T1.productId,T2.productID,Count(*) as Occurence
FROM TBL T1 INNER JOIN TBL T2
ON T1.orderid = T2.orderid
WHERE t1.productid > T2.productId
GROUP BY T1.orderID,T1.productId,T2.productID
ORDER BY Occurence DESC

SQL

+1
  WITH products as (
       SELECT DISTINCT ProductID
       FROM orders
  ),  permutation as (
      SELECT p1.ProductID as pidA, 
             p2.ProductID as pidB
      FROM products p1
      JOIN products p2
        ON p1.ProductID < p2.ProductID
  ), check_frequency as (
      SELECT pidA, pidB, COUNT (o2.orderID) total_orders
      FROM permutations p
      LEFT JOIN orders o1
        ON p.pidA = o1.ProductID
      LEFT JOIN orders o2
        ON p.pidB = o2.ProductID
       AND o1.orderID = o2.orderID
      GROUP BY pidA, pidB
  )
  SELECT TOP 2 *
  FROM check_frequency
  ORDER BY total_orders DESC
+1

Orderline:

SELECT SUM(numprods * (numprods - 1)/2) as numcombo2 
FROM ( SELECT orderid, COUNT(DISTINCT productid) as numprods
      FROM orderline ol 
      GROUP BY orderid ) o

, , , . 185 791. , , . . , , - 185791 . , . :

  • .

The approach to calculating combinations is to make self-join on the Orderline table, pairs of duplicate products are removed. The goal is to get all pairs of products. The first condition is easily met by filtering any pairs where the two products are equal. The second condition is also easily met, requiring that the first product identifier be less than the second product identifier. The following query generates all the combinations in the subquery and counts the number of orders containing each of them:

SELECT p1, p2, COUNT(*) as numorders
FROM (SELECT op1.orderid, op1.productid as p1, op2.productid as p2
FROM (SELECT DISTINCT orderid, productid FROM orderline) op1 JOIN
(SELECT DISTINCT orderid, productid FROM orderline) op2
ON op1.orderid = op2.orderid AND
op1.productid < op2.productid
) combinations
GROUP BY p1, p2

source data analysis using SQL and Excel

+1
source

Source: https://habr.com/ru/post/1689649/


All Articles