Getting all consecutive lines that differ in a specific value?

Question

Getting all consecutive lines that differ in a specific value?

I am trying to do this, as it has to do with matching sequential lines. I am trying to group values that differ by a specific number. For example, let's say I have this table:

CREATE TABLE #TEMP (A int, B int) -- Sample table INSERT INTO #TEMP VALUES (3,1), (3,2), (3,3), (3,4), (5,1), (6,1), (7,2), (8,3), (8,4), (8,5), (8,6) SELECT * FROM #TEMP DROP TABLE #TEMP

And let's say I need to group all the values that differ by 1, having the same value for A. Then I try to get this output:

 AB GroupNo 3 1 1 3 2 1 3 3 1 3 4 1 5 1 2 6 1 3 7 2 4 8 3 5 8 4 5 8 5 5 8 6 5

(3,1) (3,2) (3,3) (3,4) and (8,3) (8,4) (8,5) (8,6) were placed in the same group because they differ in value 1. First I will show my attempt:

 CREATE TABLE #TEMP (A int, B int) -- Sample table INSERT INTO #TEMP VALUES (3,1), (3,2), (3,3), (3,4), (5,1), (6,1), (7,2), (8,3), (8,4), (8,5), (8,6) -- Assign row numbers and perform a left join -- so that we can compare consecutive rows SELECT ROW_NUMBER() OVER (ORDER BY A ASC) ID, * INTO #TEMP2 FROM #TEMP ;WITH CTE AS ( SELECT XA XA, XB XB, YA YA, YB YB FROM #TEMP2 X LEFT JOIN #TEMP2 Y ON X.ID = Y.ID - 1 WHERE XA = YA AND XB = YB - 1 ) SELECT XA, XB INTO #GROUPS FROM CTE UNION SELECT YA, YB FROM CTE ORDER BY XA ASC -- Finally assign group numbers SELECT X.XA, X.XB, Y.GID FROM #GROUPS X INNER JOIN (SELECT XA, ROW_NUMBER() OVER (ORDER BY XA ASC) GID FROM #GROUPS Y GROUP BY XA ) Y ON X.XA = Y.XA DROP TABLE #TEMP DROP TABLE #TEMP2 DROP TABLE #GROUPS

I will do this on a large table (about 30 million rows), so I was hoping there was a better way to do this for arbitrary values (for example, not only differs by 1, but maybe 2 or 3, which I will later include in the procedure). Any suggestions on whether my approach is a mistake and if it can be improved?

+6

sql sql-server tsql sql-server-2008

Legend Oct 21 '11 at 20:21

source share

2 answers

In case they are different from each other, you can use

 ;WITH T AS ( SELECT *, B - DENSE_RANK() OVER (PARTITION BY A ORDER BY B) AS Grp FROM #TEMP ) SELECT A, B, DENSE_RANK() OVER (ORDER BY A,Grp) AS GroupNo FROM T ORDER BY A, Grp

And in general

 DECLARE @Interval INT = 2 ;WITH T AS ( SELECT *, B/@Interval - DENSE_RANK() OVER (PARTITION BY A, B%@Interval ORDER BY B) AS Grp FROM #TEMP ) SELECT A, B, DENSE_RANK() OVER (ORDER BY A, B%@Interval,Grp) AS GroupNo FROM T ORDER BY A, GroupNo

+3

Martin smith Oct 21 '11 at 20:32

source share

Mikael eriksson · Accepted Answer · 2011-10-21T20:34:04+0000

 declare @Diff int = 1 ;with C as ( select A, B, row_number() over(partition by A order by B) as rn from #TEMP ), R as ( select CA, CB, 1 as G, C.rn from C where C.rn = 1 union all select CA, CB, G + case when CB-RB <= @Diff then 0 else 1 end, C.rn from C inner join R on R.rn + 1 = C.rn and RA = CA ) select A, B, dense_rank() over(order by A, G) as G from R order by A, G

Getting all consecutive lines that differ in a specific value?

More articles: