I have a data load script where I create a dynamic SQL query to retrieve data and cache in our service. There is 1 table containing all the data about the product: ProductHistory (47 columns, 200,000 + records and will continue to grow)
What I need: Get the latest products using maximum id, maximum version and maximum change.
First try:
SELECT distinct Product.* FROM ProductHistory product WHERE product.version = (SELECT max(version) from ProductHistory p2 where product.Id = p2.Id and product.changeId = (SELECT max(changeid) from ProductHistory p3 where p2.changeId = p3.changeId))
It took more than 2.51 minutes.
Other unsuccessful attempts:
select distinct product.* from ProductHistory product where CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal) = (select MAX(CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal)) from ProductHistory p2 where product.Id = p2.Id)
Basically, he uses the same principle as when ordering dates, combining numbers ordered by relevance.
For example 11 Jun 2007 = 20070711 And in our case: Id = 4 , version = 127, changeid = 32 => 40127032 The zeros are there not to mix up the 3 different ids
But it will take 3.10 minutes !!! :(
So, I basically need a way to make my first request better. I was also interested with so much data, is this the best extraction speed I should expect?
I ran sp_helpindex ProductHistory and found the indexes as shown below:
PK_ProductHistoryNew - a cluster, unique primary key located on PRIMARY-Id, Version
I wrapped the first request in SP, but still no changes.
So, I wonder what other means we can improve the performance of this operation?
Thanks, Mani ps: I just run these queries in stuido SQL management to see the time.