SQL Server: Lead / Lag analytic function for groups (and not within groups)

Sorry for the long post, but I provided a copy and paste of sample data and a possible approach to the solution below. The corresponding part of the question is at the top of the message (above the horizontal rule).

I have the following table

Dt customer_id buy_time money_spent ------------------------------------------------- 2000-01-04 100 11:00:00.00 2 2000-01-05 100 16:00:00.00 1 2000-01-10 100 13:00:00.00 4 2000-01-10 100 14:00:00.00 3 2000-01-04 200 09:00:00.00 10 2000-01-06 200 10:00:00.00 11 2000-01-06 200 11:00:00.00 5 2000-01-10 200 08:00:00.00 20 

and want to get a request to get this result set

  Dt Dt_next customer_id buy_time money_spent ------------------------------------------------------------- 2000-01-04 2000-01-05 100 11:00:00.00 2 2000-01-05 2000-01-10 100 16:00:00.00 1 2000-01-10 NULL 100 13:00:00.00 4 2000-01-10 NULL 100 14:00:00.00 3 2000-01-04 2000-01-06 200 09:00:00.00 10 2000-01-06 2000-01-10 200 10:00:00.00 11 2000-01-06 2000-01-10 200 11:00:00.00 5 2000-01-10 NULL 200 08:00:00.00 20 

That is: I want every customer ( customer_id ) and every day ( Dt ) to visit the same customer ( Dt_next ) the next day.

I already have one query that gives the last set of results (data and query enclosed below the horizontal rule). However, it includes left outer join functions and two <aggregate dense_rank functions. This approach seems a little awkward to me, and I think there should be a better solution. Any pointers to alternative solutions are highly appreciated! Thanks!

BTW: I am using SQL Server 11, and the table has โ†’ 1 m records.


My request:

  select customer_table.Dt ,customer_table_lead.Dt as Dt_next ,customer_table.customer_id ,customer_table.buy_time ,customer_table.money_spent from ( select #customer_data.* ,dense_rank() over (partition by customer_id order by customer_id asc, Dt asc) as Dt_int from #customer_data ) as customer_table left outer join ( select distinct #customer_data.Dt ,#customer_data.customer_id ,dense_rank() over (partition by customer_id order by customer_id asc, Dt asc)-1 as Dt_int from #customer_data ) as customer_table_lead on ( customer_table.Dt_int=customer_table_lead.Dt_int and customer_table.customer_id=customer_table_lead.customer_id ) 

Sample data:

  create table #customer_data ( Dt date not null, customer_id int not null, buy_time time(2) not null, money_spent float not null ); insert into #customer_data values ('2000-01-04',100,'11:00:00',2); insert into #customer_data values ('2000-01-05',100,'16:00:00',1); insert into #customer_data values ('2000-01-10',100,'13:00:00',4); insert into #customer_data values ('2000-01-10',100,'14:00:00',3); insert into #customer_data values ('2000-01-04',200,'09:00:00',10); insert into #customer_data values ('2000-01-06',200,'10:00:00',11); insert into #customer_data values ('2000-01-06',200,'11:00:00',5); insert into #customer_data values ('2000-01-10',200,'08:00:00',20); 
+6
source share
1 answer

Try this query:

 select cd.Dt , t.Dt_next , cd.customer_id , cd.buy_time , cd.money_spent from ( select Dt , LEAD(Dt) OVER (PARTITION BY customer_id ORDER BY Dt) AS Dt_next , customer_id from ( select distinct Dt, customer_id from #customer_data ) t ) t inner join #customer_data cd on t.customer_id = cd.customer_id and t.Dt = cd.Dt 

Why is the money_spent field of type float? You may have problems with the calculations. Convert it to decimal type.

+11
source

Source: https://habr.com/ru/post/954246/


All Articles