SQL query performance

I need to query a table with several million rows, and I want to make it the most optimized.

Let's assume that we want to control access to a movie theater with animated screens and save it as follows:

AccessRecord (TicketId, TicketCreationTimestamp, TheaterId, ShowId, MovieId, SeatId, CheckInTimestamp) 

To simplify, the "Id" columns of the "bigint" and "Timestamp" data types are "datetime". Tickets are sold at any time, and people get access to the theater at random. And the primary key (also unique) is TicketId.

I want to get for each film and theater and show (time) AccessRecord information of the first and last person who turned to the theater to see mov. If two sessions happen simultaneously, I just need 1, either of them.

My solution would be to combine PK and the grouped column in the subquery to get the row:

 select AccessRecord.* from AccessRecord inner join( select MAX(CONVERT(nvarchar(25),CheckInTimestamp, 121) + CONVERT(varchar(25), TicketId)) as MaxKey, MIN(CONVERT(nvarchar(25),CheckInTimestamp, 121) + CONVERT(varchar(25), TicketId)) as MinKey from AccessRecord group by MovieId, TheaterId, ShowId ) as MaxAccess on CONVERT(nvarchar(25),CheckInTimestamp, 121) + CONVERT(varchar(25), TicketId) = MaxKey or CONVERT(nvarchar(25),CheckInTimestamp, 121) + CONVERT(varchar(25), TicketId) = MinKey 

Transformation 121 refers to the canonical expression resatate data, for example: aaaa-mm-dd hh: mi: ss.mmm (24h), therefore ordered as a string data type, it will give the same result as it is ordered as date and time.

As you can see, this association is not very optimized, any ideas?


Update with how I tested various solutions :

I checked all your answers in a real database with SQL Server 2008 R2 with a 3M row table to select the correct one.

If I get only the first or last person I contacted:

  • Joe Taras solution lasts 10 seconds.
  • GarethD's solution lasts 21 seconds.

If I get the same access, but with an ordered result by grouping columns:

  • Joe Taras solution lasts 10 seconds.
  • GarethD's solution lasts 46 seconds.

If I get both (first and last) people who handle an ordered result:

  • The decision of Joe Taras (makes the union) lasts 19 seconds.
  • GarethD's solution lasts 49 seconds.

The rest of the solutions (even mine) last more than 60 seconds in the first test, so I canceled it.

+4
source share
5 answers

Try the following:

 select a.* from AccessRecord a where not exists( select 'next' from AccessRecord a2 where a2.movieid = a.movieid and a2.theaterid = a.theaterid and a2.showid = a.showid and a2.checkintimestamp > a.checkintimestamp ) 

So you select the last line as a timestamp for the same movie, teather, show.

The ticket (I suppose) is different for each line

+1
source

Using analytic functions can speed up the query, more specifically ROW_NUMBER , it should reduce the number of views:

 WITH CTE AS ( SELECT TicketId, TicketCreationTimestamp, TheaterId, ShowId, MovieId, SeatId, CheckInTimestamp, RowNumber = ROW_NUMBER() OVER(PARTITION By MovieId, TheaterId, ShowId ORDER BY CheckInTimestamp, TicketID), RowNumber2 = ROW_NUMBER() OVER(PARTITION By MovieId, TheaterId, ShowId ORDER BY CheckInTimestamp DESC, TicketID) FROM AccessRecord ) SELECT TicketId, TicketCreationTimestamp, TheaterId, ShowId, MovieId, SeatId, CheckInTimestamp, FROM CTE WHERE RowNumber = 1 OR RowNumber2 = 1; 

However, as always with optimization, you configure your own queries best of all, you have data for testing and all execution plans. Try to execute the query with different indexes, if you show the actual execution plan, SSMS will even offer indexes to help your query. I would expect an index (MovieId, TheaterId, ShowId) that includes CheckInTimestamp as a keyless column would help.

+1
source
 SELECT R1.* FROM AccessRecord R1 LEFT JOIN AccessRecord R2 ON R1.MovieId = R2.MovieId AND R1.TheaterId = R2.TheaterId AND R1.ShowId = R2.ShowId AND ( R1.CheckInTimestamp < R2.CheckInTimestamp OR (R1.CheckInTimestamp = R2.CheckInTimestamp AND R1.TicketId< R2.TicketId )) WHERE R2.TicketId IS NULL 

Selects the last entry based on CheckInTimestamp. But if there is a match for this, then it is based on the highest TicketId

Including an index in MovieId, TheaterId, and ShowId will help

Here I learned a trick

0
source

Add new columns to the table and pre-convert dates or join pk in this table to access a new table that has already converted values. A new table that is looking for a conversion instead of doing this in a join will greatly speed up the process in your queries. If you can make it so that the access record gets an integer FK that falls into the lookup table (pre-converted values), you avoid using dates at all, and everything will happen faster.

If you normalize the data set and break it into a star drawing, everything will be even faster.

0
source

You may also consider combining ALL qwuery instead of this nasty OR. Orcs are usually slower than ALL combined.

0
source

Source: https://habr.com/ru/post/1501348/


All Articles