Find overlapping date ranges in PostgreSQL

Question

Find overlapping date ranges in PostgreSQL

Is it correct?

SELECT * FROM contract JOIN team USING (name_team) JOIN player USING(name_player) WHERE name_team = ? AND DATE_PART('YEAR',date_join)>= ? AND DATE_PART('YEAR',date_leave)<= ?

My contract table has the name of the player, the name of the team, and the dates when he joined and left the club.
I want to create a function that lists all the players who have been on the team for certain years.
The above query does not work ...

+9

sql php postgresql overlap date-range

aocferreira Dec 18 '10 at 23:22

source share

2 answers

Currently, the accepted answer does not answer the question. And this is basically wrong. a BETWEEN x AND y translates to:

 a >= x AND a <= y

Including the upper bound, while people should usually exclude it:

 a >= x AND a < y

With dates you can easily customize. For 2009, use "2009-12-31" as the upper bound.
But this is not so simple with timestamps that allow fractional digits. Modern versions of Postgres use an internal 8-byte integer to store up to 6 fractions of a second (μs resolution). Knowing this, we can still make it work, but it is not intuitive and depends on the implementation details. Bad idea.

Moreover, a BETWEEN x AND y does not find overlapping ranges. We need:

 b >= x AND a < y

And players who never left are not yet considered.

Correct answer

Assuming 2009 , I rephrase the question without changing its meaning:

"Find all the players in this team who joined before 2010 and didn’t leave until 2009."

Basic request:

 SELECT p.* FROM team t JOIN contract c USING (name_team) JOIN player p USING (name_player) WHERE t.name_team = ? AND c.date_join < date '2010-01-01' AND c.date_leave >= date '2009-01-01';

But there is also:

If referential integrity is ensured by FK constraints, the team tables themselves are just noise in the query and can be deleted.

Although the same player can leave and join the same team, we also need to add up possible duplicates, for example, using DISTINCT .

And we may need a special case: players who never left. Assuming these players are NULL in date_leave .

"It is assumed that a player who has not left will play for the team to this day."

Refined request:

 SELECT DISTINCT p.* FROM contract c JOIN player p USING (name_player) WHERE c.name_team = ? AND c.date_join < date '2010-01-01' AND (c.date_leave >= date '2009-01-01' OR c.date_leave IS NULL);

Operator priority works against us, AND binds before OR . We need brackets.

Associated answer with optimized DISTINCT (if duplicates occur):

Many-to-Many Table - Poor Performance

As a rule, the names of individuals are not unique and a surrogate primary key is used. But obviously name_player is the primary key of player . If all you need is the names of the players, we don’t need the table player in the query:

 SELECT DISTINCT name_player FROM contract WHERE name_team = ? AND date_join < date '2010-01-01' AND (date_leave >= date '2009-01-01' OR date_leave IS NULL);

SQL `OVERLAPS`

Leadership:

OVERLAPS automatically takes an earlier pair value as a start. Each time period is considered to represent the beginning of the half-open interval start <= time < end , unless start and end are equal, in which case it represents this single point in time.

To take care of potential NULL values, COALESCE seems the easiest:

 SELECT DISTINCT name_player FROM contract WHERE name_team = ? AND (date_join, COALESCE(date_leave, CURRENT_DATE)) OVERLAPS (date '2009-01-01', date '2010-01-01'); -- upper bound excluded

Index Type Range Type

In Postgres 9.2 or later, you can also work with real range types :

 SELECT DISTINCT name_player FROM contract WHERE name_team = ? AND daterange(date_join, date_leave) && daterange '[2009-01-01,2010-01-01)'; -- upper bound excluded

Range types increase overhead and take up more space. 2 x date = 8 bytes; 1 x daterange = 14 bytes on disk or 17 bytes in RAM. But in combination with the overlap operator && query can be supported by the GiST index.

In addition, no special NULL values are needed. NULL means "open range" in the range type - exactly what we need. The table definition does not even need to be changed: we can create a range type on the fly - and support the query using the appropriate expression index:

 CREATE INDEX mv_stock_dr_idx ON mv_stock USING gist (daterange(date_join, date_leave));

Connected with:

Average Stock History Table

+60

Erwin Brandstetter Mar 08 '13 at 23:48

source share

Scott Marlowe · Accepted Answer · 2010-12-19 05:33

Why not use between a dateless piece of stuff:

 WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'

or something like that?

Find overlapping date ranges in PostgreSQL

Correct answer

SQL OVERLAPS

Index Type Range Type

More articles:

SQL `OVERLAPS`