MySQL counts rows with the same timestamp

Is it necessary to calculate a certain mileage of timestamps that are close to each other, but not necessarily in a fixed time interval?

That is, they are not grouped by hours or minutes, but rather are grouped by how close the current timestamp of the line is to the next timestamp of the line. If the next line is within "x" seconds / minutes, add this line to the group, otherwise start a new grouping.

Given this data:

+----+---------+---------------------+ | id | item_id | event_date | +----+---------+---------------------+ | 1 | 1 | 2013-05-17 11:59:59 | | 2 | 1 | 2013-05-17 12:00:00 | | 3 | 1 | 2013-05-17 12:00:02 | | 4 | 1 | 2013-05-17 12:00:03 | | 5 | 3 | 2013-05-17 14:05:00 | | 6 | 3 | 2013-05-17 14:05:01 | | 7 | 3 | 2013-05-17 15:30:00 | | 8 | 3 | 2013-05-17 15:30:01 | | 9 | 3 | 2013-05-17 15:30:02 | | 10 | 1 | 2013-05-18 09:12:00 | | 11 | 1 | 2013-05-18 09:13:30 | | 12 | 1 | 2013-05-18 09:13:45 | | 13 | 1 | 2013-05-18 09:14:00 | | 14 | 2 | 2013-05-20 15:45:00 | | 15 | 2 | 2013-05-20 15:45:03 | | 16 | 2 | 2013-05-20 15:45:10 | | 17 | 2 | 2013-05-23 07:36:00 | | 18 | 2 | 2013-05-23 07:36:10 | | 19 | 2 | 2013-05-23 07:36:12 | | 20 | 2 | 2013-05-23 07:36:15 | | 21 | 1 | 2013-05-24 11:55:00 | | 22 | 1 | 2013-05-24 11:55:02 | +----+---------+---------------------+ 

Desired Results:

 +---------+-------+---------------------+ | item_id | total | last_date_in_group | +---------+-------+---------------------+ | 1 | 4 | 2013-05-17 12:00:03 | | 3 | 2 | 2013-05-17 14:05:01 | | 3 | 3 | 2013-05-17 15:30:02 | | 1 | 4 | 2013-05-18 09:14:00 | | 2 | 3 | 2013-05-20 15:45:10 | | 2 | 4 | 2013-05-23 07:36:15 | | 1 | 2 | 2013-05-24 11:55:02 | +---------+-------+---------------------+ 
+4
source share
2 answers

This is a bit complicated. First you need the time of the next event for each entry. The following subquery adds over the next time if it is within the bounds:

  select t.*, (select event_date from t t2 where t2.item_id = t.item_id and t2.event_date > t.event_date and <date comparison here> order by event_date limit 1 ) as nexted from t 

The correlated subquery is used here. <date comparison here> is for any date matching. When there is no entry, the value will be NULL.

Now, with this information ( nexted ) there is a trick to get grouping. For any record, this is the first event after which nexted is NULL. This will be the last event in the series. Unfortunately, this requires two levels of nested correlated subqueries (or aggregation with aggregations). The result looks a bit cumbersome:

 select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date, COUNT(*) as num_dates from (select t.*, (select min(t2.event_date) from (select t1.*, (select event_date from t t2 where t2.item_id = t1.item_id and t2.event_date > t1.event_date and <date comparison here> order by event_date limit 1 ) as nexted from t1 ) t2 where t2.nexted is null ) as grouping from t ) s group by item_id, grouping; 
+1
source

How about getting closer to it, to find the local associations of each individual record, and then group by the date of the maximum event from each opening of the record. This is based on a static differential time interval (5 minutes in my example)

 SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM ( SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group FROM table_name t1 LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE GROUP BY t1.id ) t GROUP BY last_date_in_group 
0
source

Source: https://habr.com/ru/post/1481460/


All Articles