Date range intersection in SQL

I have a table in which each row has a start and end date. They can be arbitrarily short or long.

I want to request the duration of the sum of the intersection of all rows with two start and end dates.

How can you do this in MySQL?

Or do you need to select lines that intersect the start and end times of the request, and then calculate the actual overlap of each line and sum it on the client side?


To give an example, using milliseconds to make it clearer:

Some lines:

ROW START STOP 1 1010 1240 2 950 1040 3 1120 1121 

And we want to know the total time that these lines were between 1030 and 1100.

Allows you to calculate the overlap of each row:

 ROW INTERSECTION 1 70 2 10 3 0 

So, the sum in this example is 80.

+4
source share
4 answers

If your example should have indicated 70 on the first line, then

Assuming @range_start and @range_end as parameters of your condition:

 SELECT SUM( LEAST(@range_end, stop) - GREATEST(@range_start, start) ) FROM Table WHERE @range_start < stop AND @range_end > start 

using the greatest / smallest and date functions you should be able to get what you need directly to work with the date type.

+5
source

I'm afraid you're out of luck.

Since you do not know the number of rows that you β€œcumulatively intersect,” you need either a recursive solution or an aggregation operator.

The aggregation operator that you need is not an option because SQL does not have the data type on which it should operate (this type is the interval type, as described in "Temporary data and the relational model").

A recursive solution may be possible, but it is probably difficult to write, difficult for other programmers to read, and it is also doubtful whether the optimizer can turn this query into an optimal data access strategy.

Or I misunderstood your question.

+1
source

There is an interesting enough solution if you know the maximum time you have ever had. Create a table with all the numbers in it from one to the maximum time.

 millisecond ----------- 1 2 3 ... 1240 

Name it time_dimension (this method is often used when modeling sizes when storing data.)

Then this:

 SELECT COUNT(*) FROM your_data INNER JOIN time_dimension ON time_dimension.millisecond BETWEEN your_data.start AND your_data.stop WHERE time_dimension.millisecond BETWEEN 1030 AND 1100 

... will give you a total number of milliseconds of run time between 1030 and 1100.

Of course, whether you can use this method depends on whether you can safely predict the maximum number of milliseconds that will ever be in your data.

This is often used in a data warehouse, as I said; it fits well with some problems - for example, I used it for insurance systems where the total number of days between two dates was needed and where the total date range of the data could be easily estimated (from the earliest date of birth of the client to a date for several years in the future , for the end date of any policies being sold.)

It may not work for you, but I thought it was worth sharing as an interesting technique!

+1
source

After you added the example, it is clear that I really did not understand your question.

You are not cumulatively intersecting lines.

Steps that will lead to the solution:

intersect each start and end points of the line with the given start and end points. This should be done using CASE expressions or something similar, something in style:

SELECT (CASE startdate <givenstartdate: givenstartdate, CASE startdate> = givenstartdate: startdate) as a saved startdate, (similar to enddate), like keepedenddate FROM ... Cater for zeros and this kind of material as needed.

When saving the start date and the saved date, use the date function to calculate the length of the held interval (which is the overlap of your string with the given time section).

SELECT SUM () from them.

0
source

Source: https://habr.com/ru/post/1308858/


All Articles