Calculation of customer travel by transaction dates

The main user of T-SQL is here. I have problems trying to complete a task, and I will be grateful for some recommendations. I apologize in advance for any errors, since English is not my native language.

I have a table with a lot of transactions, for simplicity let me say that I have only two columns: CUSTOMER_ID, which is my client, and DATE, which is the date of the transaction.

My clients make many transactions while they are in the city, but then they can spend weeks, months or even years before returning and starting to do transactions again. I would like to somehow identify each of these "trips" and group transactions, and then I would like to make breaks, such as calculating the duration of the trip, the number of transactions, etc.

I would like to consider Trip as any new transaction that occurs after an IDLE period of 10 days.

Let me better explain my query using a simple example:

This is my transaction table:

+-------------+------------+
| CUSTOMER_ID |    DATE    |
+-------------+------------+
| JHON        | 01-01-2016 |
| JHON        | 01-02-2016 |
| PEDRO       | 01-02-2016 |
| JHON        | 01-05-2016 |
| MIKE        | 01-05-2016 |
| MIKE        | 01-10-2016 |
| JHON        | 01-07-2016 |
| …           ||
| JHON        | 02-15-2016 |
| JHON        | 02-18-2016 |
| MIKE        | 02-19-2016 |
| MIKE        | 02-19-2016 |
+-------------+------------+

So far I have made this request to list customer visits:

SELECT
    CUSTOMER_ID,
    DATE,
    ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM

FROM
    TRANSACTIONS
WHERE
    CUSTOMER_ID IN ('JHON','MIKE','PEDRO')

Running this query will produce a result similar to this:

+-------------+------------+-----------+
| CUSTOMER_ID |    DATE    | VISIT_NUM |
+-------------+------------+-----------+
| JHON        | 01-01-2016 |         1 |
| JHON        | 01-02-2016 |         2 |
| JHON        | 01-07-2016 |         3 |
| JHON        | 02-15-2016 |         4 |
| JHON        | 02-18-2016 |         5 |
| MIKE        | 01-05-2016 |         1 |
| MIKE        | 01-10-2016 |         2 |
| MIKE        | 02-19-2016 |         3 |
| MIKE        | 02-19-2016 |         4 |
| PEDRO       | 01-02-2016 |         1 |
+-------------+------------+-----------+

: - , (, ) , , :

+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON        |        1 | 01-01-2016    | 01-07-2016  |             7 |            3 |
| JHON        |        2 | 02-15-2016    | 02-18-2016  |             3 |            2 |
| MIKE        |        1 | 01-05-2016    | 01-10-2016  |             5 |            2 |
| MIKE        |        2 | 02-19-2016    | 02-19-2016  |             1 |            2 |
| PEDRO       |        1 | 01-02-2016    | 01-02-2016  |             1 |            1 |
+-------------+----------+---------------+-------------+---------------+--------------+

, - 3 . 10 , "" . - , , , . ( - ), .

, , ( , , ).

. .

+4
2

. Reddit nvarscar !

/ , - :

, . , , .

DECLARE @t TABLE 
    ([CUSTOMER_ID] varchar(5), [DATE] datetime)
;

INSERT INTO @t
    ([CUSTOMER_ID], [DATE])
VALUES
    ('JHON', '2016-01-01 00:00:00'),
    ('JHON', '2016-01-02 00:00:00'),
    ('PEDRO', '2016-01-02 00:00:00'),
    ('JHON', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-10 00:00:00'),
    ('JHON', '2016-01-07 00:00:00'),
    ('JHON', '2016-02-15 00:00:00'),
    ('JHON', '2016-02-18 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('JHON', '2016-02-01 00:00:00'),
    ('JHON', '2016-02-02 00:00:00'),
    ('PEDRO', '2016-03-02 00:00:00'),
    ('JHON', '2016-03-05 00:00:00'),
    ('MIKE', '2016-05-05 00:00:00'),
    ('MIKE', '2016-05-10 00:00:00'),
    ('JHON', '2016-03-07 00:00:00'),
    ('JHON', '2016-04-15 00:00:00'),
    ('JHON', '2016-04-18 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00')
;


WITH CTE1 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM @t
GROUP BY 
  [CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT 
  [CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag
+1

, ,

 ;with cte
 as
 (select cid,datee,datepart(month,datee) as monthh,
  dense_rank () over (partition by cid order by datepart(month,datee)) as samemonth,
 count(0) over (partition by cid,datepart(month,datee) ) as cnt
 from #temp
)
,cte1 as
 (
select cid,max(samemonth) as tripnumber,min(datee) as startdate,max(datee) as enddate,
max(cnt) as numberoftrips
from  cte 
group by cid,samemonth
)
select *,datediff(day,startdate,dateadd(day,1,enddate))as duration
from  cte1 

:

cid   tripnumber startdate      enddate    numberoftransactions duration
JHON    1        2016-01-01    2016-01-07   3                    7
JHON    2        2016-02-15    2016-02-18   2                    4
MIKE    1        2016-01-05    2016-01-10   2                    6
MIKE    2        2016-02-19    2016-02-19   2                    1
PEDRO   1        2016-01-02    2016-01-02   1                    1
+2

Source: https://habr.com/ru/post/1648531/


All Articles