Search for duplicate orders (by time proximity)

I have a table of orders, which, as I know, have duplicates

customer order_number order_date ---------- ------------ ------------------- 1 1 2012-03-01 01:58:00 1 2 2012-03-01 02:01:00 1 3 2012-03-01 02:03:00 2 4 2012-03-01 02:15:00 3 5 2012-03-01 02:18:00 3 6 2012-03-01 04:30:00 4 7 2012-03-01 04:35:00 5 8 2012-03-01 04:38:00 6 9 2012-03-01 04:58:00 6 10 2012-03-01 04:59:00 

I want to find all duplicates (order from one client within 60 minutes from each other). Either a result set consisting of duplicate strings, or a set of all clients counting the number of duplicates.

Here is what I tried

 SELECT customer, count(*) FROM orders GROUP BY customer, DATEPART(HOUR, order_date) HAVING (count(*) > 1) 

This does not work if duplicates are within 60 minutes of each other, but are at different hours, i.e. 1:58 and 2:02

I also tried this

 SELECT o1.customer, o1.order_number, o2.order_number, DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff] FROM orders o1 LEFT OUTER JOIN orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number WHERE ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60 

Now this gives me all the duplicates, but also gives me several rows in one duplicate. ie (o1, o2) and (o2, o1), which would not be so bad if there werenโ€™t some orders with several duplicates. In those cases, I get (o1, o2), (o1, o3), (o2, o1), (o2, o3), (o3, o1), (o3, o2), etc. I get all the permutations.

Does anyone have an understanding? I'm not necessarily looking for the best answer here, only the one that works.

+6
source share
3 answers
 SELECT *, CASE WHEN EXISTS (SELECT * FROM orders AS lookup WHERE customer = orders.customer AND order_date < orders.order_date AND order_date >= DATEADD(hour, -1, order_date) ) THEN 'Principle Order' ELSE 'Duplicate Order' END as Order_Status FROM orders 

Using EXISTS and a correlated subquery, you can check if there have been any previous orders in the last hour.

+3
source

Maybe something like this:

Test data:

 DECLARE @tbl TABLE(customer INT,order_number INT,order_date DATETIME) INSERT INTO @tbl VALUES (1,1,'2012-03-01 01:58:00'), (1,2,'2012-03-01 02:01:00'), (1,3,'2012-03-01 02:03:00'), (2,4,'2012-03-01 02:15:00'), (3,5,'2012-03-01 02:18:00'), (3,6,'2012-03-01 04:30:00'), (4,7,'2012-03-01 04:35:00'), (5,8,'2012-03-01 04:38:00'), (6,9,'2012-03-01 04:58:00'), (6,10,'2012-03-01 04:59:00') 

Query

 ;WITH CTE AS ( SELECT MIN(datediff(minute,'1990-1-1',order_date)) OVER(PARTITION BY customer) AS minDate, datediff(minute,'1990-1-1',order_date) AS DateTicks, tbl.customer FROM @tbl AS tbl ) SELECT CTE.customer, SUM(CASE WHEN (CTE.DateTicks-CTE.minDate)<60 THEN 1 ELSE 0 END) FROM CTE GROUP BY CTE.customer 
+1
source

The following query identifies all possible rearrangements of orders within 60 minutes of each other:

 DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME) INSERT INTO @orders VALUES (1, 1, '2012-03-01 01:58:00'), (1, 2, '2012-03-01 02:01:00'), (1, 3, '2012-03-01 02:03:00'), (2, 4, '2012-03-01 02:15:00'), (3, 5, '2012-03-01 02:18:00'), (3, 6, '2012-03-01 04:30:00'), (4, 7, '2012-03-01 04:35:00'), (5, 8, '2012-03-01 04:38:00'), (6, 9, '2012-03-01 04:58:00'), (6, 10, '2012-03-01 04:59:00'); with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate) as ( select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate from @orders o union all select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate from ProximityOrderCascade p inner join @orders o on p.customerid = o.customerid and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60 and o.orderid <> p.orderid where proximateorderid is null ) select * from ProximityOrderCascade where not ProximateOrderId is null 

From there, you can convert the results to a query of your choice. The results of this function only identify customers 1 and 6 as having โ€œduplicateโ€ orders.

 CustomerId OrderId ProximateOrderId MinutesDifference OrderDate ProximateOrderDate ----------- ----------- ---------------- ----------------- ----------------------- ----------------------- 6 9 10 -1 2012-03-01 04:58:00.000 2012-03-01 04:59:00.000 6 10 9 1 2012-03-01 04:59:00.000 2012-03-01 04:58:00.000 1 1 3 -5 2012-03-01 01:58:00.000 2012-03-01 02:03:00.000 1 2 3 -2 2012-03-01 02:01:00.000 2012-03-01 02:03:00.000 1 1 2 -3 2012-03-01 01:58:00.000 2012-03-01 02:01:00.000 1 3 2 2 2012-03-01 02:03:00.000 2012-03-01 02:01:00.000 1 2 1 3 2012-03-01 02:01:00.000 2012-03-01 01:58:00.000 1 3 1 5 2012-03-01 02:03:00.000 2012-03-01 01:58:00.000 (8 row(s) affected) 
+1
source

Source: https://habr.com/ru/post/909834/


All Articles