Group and count rows by value until it changes

I have a table where messages are stored as they arise. Usually the message "A" appears, and sometimes A are separated by a single message "B". Now I want to group the values ​​so that I can analyze them, for example, find the longest "A'-band" or "A'-bands".

I already tried the COUNT-OVER request, but it continues to count every message.

SELECT message, COUNT(*) OVER (ORDER BY Timestamp RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 

This is my example data:

 Timestamp Message 20150329 00:00 A 20150329 00:01 A 20150329 00:02 B 20150329 00:03 A 20150329 00:04 A 20150329 00:05 A 20150329 00:06 B 

I need the following output

 Message COUNT A 2 B 1 A 3 B 1 
+6
source share
2 answers

That was interesting:)

 ;WITH cte as ( SELECT Messages.Message, Timestamp, ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn, ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn FROM Messages ), cte2 AS ( SELECT Message, Timestamp, gn, rn, gn - rn as gb FROM cte ), cte3 AS ( SELECT Message, MIN(Timestamp) As Ts, COUNT(1) as Cnt FROM cte2 GROUP BY Message, gb) SELECT Message, Cnt FROM cte3 ORDER BY Ts 

Here is the result:

  Message Cnt A 2 B 1 A 3 B 1 

The request may be shorter, but I am sending it in such a way that you can see what is happening. The result exactly matches the requested. This is the most important part of gn - rn idea is to type lines in each section and at the same time indicate rows in the whole set, then if you subtract one from the other, you will get the "rank" of each group.

 ;WITH cte as ( SELECT Messages.Message, Timestamp, ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn, ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn FROM Messages ), cte2 AS ( SELECT Message, Timestamp, gn, rn, gn - rn as gb FROM cte ) SELECT * FROM cte2 Message Timestamp gn rn gb A 2015-03-29 00:00:00.000 1 1 0 A 2015-03-29 00:01:00.000 2 2 0 B 2015-03-29 00:02:00.000 1 3 -2 A 2015-03-29 00:03:00.000 3 4 -1 A 2015-03-29 00:04:00.000 4 5 -1 A 2015-03-29 00:05:00.000 5 6 -1 B 2015-03-29 00:06:00.000 2 7 -5 
+7
source

Here is a slightly smaller solution:

 DECLARE @t TABLE ( d DATE, m CHAR(1) ) INSERT INTO @t VALUES ( '20150301', 'A' ), ( '20150302', 'A' ), ( '20150303', 'B' ), ( '20150304', 'A' ), ( '20150305', 'A' ), ( '20150306', 'A' ), ( '20150307', 'B' ); WITH c1 AS(SELECT d, m, IIF(LAG(m, 1, m) OVER(ORDER BY d) = m, 0, 1) AS n FROM @t), c2 AS(SELECT m, SUM(n) OVER(ORDER BY d) AS n FROM c1) SELECT m, COUNT(*) AS c FROM c2 GROUP BY m, n 

Output:

 mc A 2 B 1 A 3 B 1 

The idea is to get a value of 1 in the lines where the message changes:

 2015-03-01 A 0 2015-03-02 A 0 2015-03-03 B 1 2015-03-04 A 1 2015-03-05 A 0 2015-03-06 A 0 2015-03-07 B 1 

The second step is simply the sum of the current row value + all previous values:

 2015-03-01 A 0 2015-03-02 A 0 2015-03-03 B 1 2015-03-04 A 2 2015-03-05 A 2 2015-03-06 A 2 2015-03-07 B 3 

This way you get groupings of sets by message column and calculated column.

+3
source

Source: https://habr.com/ru/post/984379/


All Articles