SQL Query - the need for better performance

I have a data load script where I create a dynamic SQL query to retrieve data and cache in our service. There is 1 table containing all the data about the product: ProductHistory (47 columns, 200,000 + records and will continue to grow)

What I need: Get the latest products using maximum id, maximum version and maximum change.

First try:

SELECT distinct Product.* FROM ProductHistory product WHERE product.version = (SELECT max(version) from ProductHistory p2 where product.Id = p2.Id and product.changeId = (SELECT max(changeid) from ProductHistory p3 where p2.changeId = p3.changeId)) 

It took more than 2.51 minutes.

Other unsuccessful attempts:

 select distinct product.* from ProductHistory product where CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal) = (select MAX(CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal)) from ProductHistory p2 where product.Id = p2.Id) 

Basically, he uses the same principle as when ordering dates, combining numbers ordered by relevance.

 For example 11 Jun 2007 = 20070711 And in our case: Id = 4 , version = 127, changeid = 32 => 40127032 The zeros are there not to mix up the 3 different ids 

But it will take 3.10 minutes !!! :(

So, I basically need a way to make my first request better. I was also interested with so much data, is this the best extraction speed I should expect?

  • I ran sp_helpindex ProductHistory and found the indexes as shown below:

    PK_ProductHistoryNew - a cluster, unique primary key located on PRIMARY-Id, Version

  • I wrapped the first request in SP, but still no changes.

So, I wonder what other means we can improve the performance of this operation?

Thanks, Mani ps: I just run these queries in stuido SQL management to see the time.

+6
source share
8 answers

Run the query from Sql Server Management Studio and look at the query plan to see where the bottleneck is. Anywhere where you see “table scan” or “index scan”, he has to go through all the data to find what he is looking for. If you create the appropriate indexes that can be used for these operations, this should improve performance.

+6
source

Some things I see:

  • Is DISTINCT required? If you are running DISTINCT * , this is unlikely to be beneficial, but it will have the overhead to check for duplicates in all fields.
  • Instead of two subqueries in your WHERE , JOIN WHERE to the JOIN . This should be processed only once. I suspect your WHERE processed multiple times.

<- →

 SELECT Product.* FROM ProductHistory product INNER JOIN ( SELECT P.Id, MAX(p.version) as [MaxVer], MAX(p.Changeid) as [MaxChange] FROM Product p GROUP BY p.ID) SubQ ON SubQ.ID = product.ID AND SubQ.MaxChange = Product.ChangeID AND SubQ.MaxVer = Product.Version 

For this, there must also be an index Id, Version, ChangeID .

+4
source

Well, storing everything in a table is wrong. It’s better to keep the latest version in a table and use a different one (with the same structure) for the story (since, I think, you are more interested in current products than old ones). And problems with the concept will create many workarounds ...

Also, do not use DISTINCT because it often hides problems in the query (usually, if duplicates are retrieved, this means that you can optimize better).

Now, the best part: how to solve your problem? I assume that you should use the grouping principle giving something like this:

 SELECT max(id), max(version), max(changeid) FROM ProductHistory p WHERE <filter if necessary for old products or anything else> GROUP BY version, changeid HAVING version = max(version) AND changeid = max(changeid) AND id = max(id) 

But, if I look at your PC, I am surprised that changeid does not matter, since you only have to deal with id and version ...

I am not sure that my request is completely right, because I can’t check, but I think you can do some tests.

+1
source

I think for this query you need an index (Id, changeId, version) . Please provide a table definition, indexes in the table, and a query plan for your query.

0
source

This is a little funny, but I wonder if there will be partitioning:

  SELECT Id FROM ( SELECT Id, MAX(version) OVER (PARTITION BY changeId) max_version FROM ProductHistory ) s where version = s.max_version 
0
source

I have the feeling that this request will take longer because the number of rows is increasing, but it’s worth taking a picture:

 SELECT * FROM ( SELECT Col1, Col2, Col3, ROW_NUMBER() OVER (PARTITION BY ProductHistory.Id ORDER BY Version DESC, ChangeID DESC) AS RowNumber FROM ProductHistory ) WHERE RowNumber = 1 
0
source

Try this CTE, it should be the fastest option, and you probably won't even need indexes to get great speed:

 with mysuperfastcte as ( select product.*, row_number() over (partition by id order by version desc) as versionorder, row_number() over (partition by id order by changeid desc) as changeorder from ProductHistory as product ) select distinct product.* from mysuperfastcte where versionorder = 1 and changeorder = 1; 

NB. I think you might have an error at this point in your code, so please confirm and double-check the results expected with my code:

  and product.changeId = (SELECT max(changeid) from ProductHistory p3 where p2.changeId = p3.changeId)) 
  • you are trying to get max (changeid) using a correlated subquery, but you are also joining changeid - this is the same as just getting every row. Presumably, you did not expect this?

Also - obviously, reduce the number of columns you return only to the ones you need, and then run the following query before executing the query and check the message output:

SET STATISTICS IO ON

Look for tables with high logical readings and find out where the index will help you.

Tip. If my code works for you, then depending on the columns you need, you may:

create an index ix1 (id, version desc) include (changeid, ....) in the ProductHistory.

Hope this helps!

0
source

Speaking of a common language, select max () to sort the entire table. And you do it twice

SELECT TOP 1 is faster, but you need to make sure your index is right and you have the correct ORDER BY. See if you can play with this.

-1
source

Source: https://habr.com/ru/post/893147/


All Articles