Simple self-help

Can anyone advise how to improve the performance of the next query. Note: the problem seems to be caused by the where clause.

Data (the table contains a huge set of rows - 500K +, a set of called parameters with the assumption of returning 2-5K records per request, which currently takes 8-10 minutes):

 USE [SomeDb] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Data]( [x] [money] NOT NULL, [y] [money] NOT NULL, CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED ( [x] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO 

Request

 select top 10000 sx as sx, ex as ex, sy as sy, ey as ey, ey - sy as y_delta, ex - sx as x_delta from Data s inner join Data e on ex > sx and ex - sx between xFrom and xTo --where ey - sy > @yDelta -- when uncommented causes a huge delay 

Update 1 - Implementation Plan


 <?xml version="1.0" encoding="utf-16"?> <ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="11.0.2100.60" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementCompId="1" StatementEstRows="100" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.0263655" StatementText="select top 100&#xD;&#xA;sx as sx,&#xD;&#xA;ex as ex,&#xD;&#xA;sy as sy,&#xD;&#xA;ey as ey,&#xD;&#xA;ey - sy as y_delta,&#xD;&#xA;ex - sx as x_delta&#xD;&#xA;from Data s &#xD;&#xA; inner join Data e&#xD;&#xA; on ex &gt; sx and ex - sx between 100 and 105&#xD;&#xA;where ey - sy &gt; 0.01&#xD;&#xA;" StatementType="SELECT" QueryHash="0xAAAC02AC2D78CB56" QueryPlanHash="0x747994153CB2D637" RetrievedFromCache="true"> <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" /> <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="NoParallelPlansInDesktopOrExpressEdition" CachedPlanSize="24" CompileTime="13" CompileCPU="13" CompileMemory="424"> <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" /> <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="52199" EstimatedPagesCached="14561" EstimatedAvailableDegreeOfParallelism="4" /> <RelOp AvgRowSize="55" EstimateCPU="1E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.0263655"> <OutputList> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> <ColumnReference Column="Expr1004" /> <ColumnReference Column="Expr1005" /> </OutputList> <ComputeScalar> <DefinedValues> <DefinedValue> <ColumnReference Column="Expr1004" /> <ScalarOperator ScalarString="[SomeDb].[dbo].[Data].[y] as [e].[y]-[SomeDb].[dbo].[Data].[y] as [s].[y]"> <Arithmetic Operation="SUB"> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </Identifier> </ScalarOperator> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> </Identifier> </ScalarOperator> </Arithmetic> </ScalarOperator> </DefinedValue> <DefinedValue> <ColumnReference Column="Expr1005" /> <ScalarOperator ScalarString="[SomeDb].[dbo].[Data].[x] as [e].[x]-[SomeDb].[dbo].[Data].[x] as [s].[x]"> <Arithmetic Operation="SUB"> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> </Identifier> </ScalarOperator> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> </Identifier> </ScalarOperator> </Arithmetic> </ScalarOperator> </DefinedValue> </DefinedValues> <RelOp AvgRowSize="39" EstimateCPU="1E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Top" NodeId="1" Parallel="false" PhysicalOp="Top" EstimatedTotalSubtreeCost="0.0263555"> <OutputList> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </OutputList> <RunTimeInformation> <RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" /> </RunTimeInformation> <Top RowCount="false" IsPercent="false" WithTies="false"> <TopExpression> <ScalarOperator ScalarString="(100)"> <Const ConstValue="(100)" /> </ScalarOperator> </TopExpression> <RelOp AvgRowSize="39" EstimateCPU="151828" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Inner Join" NodeId="2" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.0263455"> <OutputList> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </OutputList> <RunTimeInformation> <RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="0" ActualExecutions="1" /> </RunTimeInformation> <NestedLoops Optimized="false"> <OuterReferences> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </OuterReferences> <RelOp AvgRowSize="23" EstimateCPU="1.80448" EstimateIO="3.76461" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Clustered Index Scan" NodeId="3" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="1640290"> <OutputList> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </OutputList> <RunTimeInformation> <RunTimeCountersPerThread Thread="0" ActualRows="15225" ActualEndOfScans="0" ActualExecutions="1" /> </RunTimeInformation> <IndexScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false"> <DefinedValues> <DefinedValue> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> </DefinedValue> <DefinedValue> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </DefinedValue> </DefinedValues> <Object Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Index="[PK_Data]" Alias="[e]" IndexKind="Clustered" /> </IndexScan> </RelOp> <RelOp AvgRowSize="23" EstimateCPU="0.902317" EstimateIO="1.88387" EstimateRebinds="1" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="100" LogicalOp="Clustered Index Seek" NodeId="4" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0263655" TableCardinality="1640290"> <OutputList> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> </OutputList> <RunTimeInformation> <RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="15224" ActualExecutions="15225" /> </RunTimeInformation> <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore"> <DefinedValues> <DefinedValue> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> </DefinedValue> <DefinedValue> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> </DefinedValue> </DefinedValues> <Object Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Index="[PK_Data]" Alias="[s]" IndexKind="Clustered" /> <SeekPredicates> <SeekPredicateNew> <SeekKeys> <EndRange ScanType="LT"> <RangeColumns> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> </RangeColumns> <RangeExpressions> <ScalarOperator ScalarString="[SomeDb].[dbo].[Data].[x] as [e].[x]"> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> </Identifier> </ScalarOperator> </RangeExpressions> </EndRange> </SeekKeys> </SeekPredicateNew> </SeekPredicates> <Predicate> <ScalarOperator ScalarString="([SomeDb].[dbo].[Data].[x] as [e].[x]-[SomeDb].[dbo].[Data].[x] as [s].[x])&gt;=($100.0000) AND ([SomeDb].[dbo].[Data].[x] as [e].[x]-[SomeDb].[dbo].[Data].[x] as [s].[x])&lt;=($105.0000) AND ([SomeDb].[dbo].[Data].[y] as [e].[y]-[SomeDb].[dbo].[Data].[y] as [s].[y])&gt;(0.01)"> <Logical Operation="AND"> <ScalarOperator> <Compare CompareOp="GE"> <ScalarOperator> <Arithmetic Operation="SUB"> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> </Identifier> </ScalarOperator> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> </Identifier> </ScalarOperator> </Arithmetic> </ScalarOperator> <ScalarOperator> <Const ConstValue="($100.0000)" /> </ScalarOperator> </Compare> </ScalarOperator> <ScalarOperator> <Compare CompareOp="LE"> <ScalarOperator> <Arithmetic Operation="SUB"> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="x" /> </Identifier> </ScalarOperator> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="x" /> </Identifier> </ScalarOperator> </Arithmetic> </ScalarOperator> <ScalarOperator> <Const ConstValue="($105.0000)" /> </ScalarOperator> </Compare> </ScalarOperator> <ScalarOperator> <Compare CompareOp="GT"> <ScalarOperator> <Arithmetic Operation="SUB"> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[e]" Column="y" /> </Identifier> </ScalarOperator> <ScalarOperator> <Identifier> <ColumnReference Database="[SomeDb]" Schema="[dbo]" Table="[Data]" Alias="[s]" Column="y" /> </Identifier> </ScalarOperator> </Arithmetic> </ScalarOperator> <ScalarOperator> <Const ConstValue="(0.01)" /> </ScalarOperator> </Compare> </ScalarOperator> </Logical> </ScalarOperator> </Predicate> </IndexScan> </RelOp> </NestedLoops> </RelOp> </Top> </RelOp> </ComputeScalar> </RelOp> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence> </ShowPlanXML> 
+4
source share
3 answers

I often saw big performance gains by inserting the results of the first query (in your case without the where clause) into a TEMP table or table variable and then choosing from it (which basically helps the query optimizer choose the appropriate execution plan).

Also noticed that you do not have an index on column Y, which can speed up a bit.

EDIT Also try the following (gives me a slightly better performance):

 SELECT * FROM (SELECT sx as sx, ex as ex, sy as sy, ey as ey, ey - sy as y_delta, ex - sx as x_delta FROM Data s JOIN Data e ON ex > sx ) data WHERE data.y_delta > @yDelta AND data.x_delta BETWEEN @xFrom AND @xTo 
+3
source

The main problems are that the where clause gives a cross join (not what I would call a simple join) (giving a lot of lines, this is that the union compares a lot of lines in e for a line in s), as well as a .y comparison should use a table scan (at least temporary data, if not the entire table).

If the query is distributed, there is a possible fix for a one-time combining and copying of data into a previously calculated table and indexing the differences.

+1
source

I might think of an odd performance improvement strategy. This includes adding an automatically incrementing id to the table, and then searching for borders for Xfrom and Xto in terms of id, and then searching for something in the range.

The following query assumes what I mean:

 with e1 as ( select e.*, (select min(id) from data s where ex between sx + @xfrom and sx + @xto) as idstart, (select max(id) from data s where ex between sx + @xfrom and sx + @xto) as idend from data e ) select <whatever> from e1 join e1 s on e1.idstart = s.id where e1.idstart <= e1.idend union all select <whatever> from e1 join e1 s on e1.idstart+1 = s.id where e1.idstart+1 <= e1.idend . . . 

I got lucky with a correlated subquery in an indexed field returning the next value. After this connection are equijoins, which must be very fast. In the end, you would like to modify the query so that it can perform comparisons using a table of enumerated values.

+1
source

Source: https://habr.com/ru/post/1433672/


All Articles