Search for duplicate orders (by time proximity)

Question

Search for duplicate orders (by time proximity)

I have a table of orders, which, as I know, have duplicates

customer order_number order_date ---------- ------------ ------------------- 1 1 2012-03-01 01:58:00 1 2 2012-03-01 02:01:00 1 3 2012-03-01 02:03:00 2 4 2012-03-01 02:15:00 3 5 2012-03-01 02:18:00 3 6 2012-03-01 04:30:00 4 7 2012-03-01 04:35:00 5 8 2012-03-01 04:38:00 6 9 2012-03-01 04:58:00 6 10 2012-03-01 04:59:00

I want to find all duplicates (order from one client within 60 minutes from each other). Either a result set consisting of duplicate strings, or a set of all clients counting the number of duplicates.

Here is what I tried

 SELECT customer, count(*) FROM orders GROUP BY customer, DATEPART(HOUR, order_date) HAVING (count(*) > 1)

This does not work if duplicates are within 60 minutes of each other, but are at different hours, i.e. 1:58 and 2:02

I also tried this

 SELECT o1.customer, o1.order_number, o2.order_number, DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff] FROM orders o1 LEFT OUTER JOIN orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number WHERE ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60

Now this gives me all the duplicates, but also gives me several rows in one duplicate. ie (o1, o2) and (o2, o1), which would not be so bad if there weren’t some orders with several duplicates. In those cases, I get (o1, o2), (o1, o3), (o2, o1), (o2, o3), (o3, o1), (o3, o2), etc. I get all the permutations.

Does anyone have an understanding? I'm not necessarily looking for the best answer here, only the one that works.

+6

sql sql-server

Ben english Mar 2 '12 at 15:19

source share

3 answers

Maybe something like this:

Test data:

 DECLARE @tbl TABLE(customer INT,order_number INT,order_date DATETIME) INSERT INTO @tbl VALUES (1,1,'2012-03-01 01:58:00'), (1,2,'2012-03-01 02:01:00'), (1,3,'2012-03-01 02:03:00'), (2,4,'2012-03-01 02:15:00'), (3,5,'2012-03-01 02:18:00'), (3,6,'2012-03-01 04:30:00'), (4,7,'2012-03-01 04:35:00'), (5,8,'2012-03-01 04:38:00'), (6,9,'2012-03-01 04:58:00'), (6,10,'2012-03-01 04:59:00')

Query

 ;WITH CTE AS ( SELECT MIN(datediff(minute,'1990-1-1',order_date)) OVER(PARTITION BY customer) AS minDate, datediff(minute,'1990-1-1',order_date) AS DateTicks, tbl.customer FROM @tbl AS tbl ) SELECT CTE.customer, SUM(CASE WHEN (CTE.DateTicks-CTE.minDate)<60 THEN 1 ELSE 0 END) FROM CTE GROUP BY CTE.customer

+1

Arion Mar 2 '12 at 15:46

source share

The following query identifies all possible rearrangements of orders within 60 minutes of each other:

 DECLARE @orders TABLE (CustomerId INT, OrderId INT, OrderDate DATETIME) INSERT INTO @orders VALUES (1, 1, '2012-03-01 01:58:00'), (1, 2, '2012-03-01 02:01:00'), (1, 3, '2012-03-01 02:03:00'), (2, 4, '2012-03-01 02:15:00'), (3, 5, '2012-03-01 02:18:00'), (3, 6, '2012-03-01 04:30:00'), (4, 7, '2012-03-01 04:35:00'), (5, 8, '2012-03-01 04:38:00'), (6, 9, '2012-03-01 04:58:00'), (6, 10, '2012-03-01 04:59:00'); with ProximityOrderCascade(CustomerId, OrderId, ProximateOrderId, MinutesDifference, OrderDate, ProximateOrderDate) as ( select o.customerid, o.orderid, null, null, o.orderdate, o.orderdate from @orders o union all select o.customerid, o.orderid, p.orderid, datediff(minute, p.OrderDate, o.OrderDate), o.OrderDate, p.OrderDate from ProximityOrderCascade p inner join @orders o on p.customerid = o.customerid and abs(datediff(minute, p.OrderDate, o.OrderDate)) between 0 and 60 and o.orderid <> p.orderid where proximateorderid is null ) select * from ProximityOrderCascade where not ProximateOrderId is null

From there, you can convert the results to a query of your choice. The results of this function only identify customers 1 and 6 as having “duplicate” orders.

 CustomerId OrderId ProximateOrderId MinutesDifference OrderDate ProximateOrderDate ----------- ----------- ---------------- ----------------- ----------------------- ----------------------- 6 9 10 -1 2012-03-01 04:58:00.000 2012-03-01 04:59:00.000 6 10 9 1 2012-03-01 04:59:00.000 2012-03-01 04:58:00.000 1 1 3 -5 2012-03-01 01:58:00.000 2012-03-01 02:03:00.000 1 2 3 -2 2012-03-01 02:01:00.000 2012-03-01 02:03:00.000 1 1 2 -3 2012-03-01 01:58:00.000 2012-03-01 02:01:00.000 1 3 2 2 2012-03-01 02:03:00.000 2012-03-01 02:01:00.000 1 2 1 3 2012-03-01 02:01:00.000 2012-03-01 01:58:00.000 1 3 1 5 2012-03-01 02:03:00.000 2012-03-01 01:58:00.000 (8 row(s) affected)

+1

Rabid Mar 2 '12 at 16:29

source share

MatBailie · Accepted Answer · 2012-03-02T15:53:26+0000

 SELECT *, CASE WHEN EXISTS (SELECT * FROM orders AS lookup WHERE customer = orders.customer AND order_date < orders.order_date AND order_date >= DATEADD(hour, -1, order_date) ) THEN 'Principle Order' ELSE 'Duplicate Order' END as Order_Status FROM orders

Using EXISTS and a correlated subquery, you can check if there have been any previous orders in the last hour.

Search for duplicate orders (by time proximity)

More articles: