How can I improve this SQL query?

Question

How can I improve this SQL query?

Today I ran into an interesting SQL problem, and while I came up with a solution that works, I doubt that this is the best or most efficient answer. I belong to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, the query is part of an SSRS report that will run against about 100,000 rows.

Essentially, I have a list of identifiers that can have multiple values associated with them, the values are Yes, No, or some other string. For ID x, if any of the values is Yes, x should be Yes, if all of them are No, it should be No, if they contain any other values, but yes and no, display this value. I only want to return 1 row to ID, without duplicates.

Simplified version and test case:

DECLARE @tempTable table ( ID int, Val varchar(1) ) INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F') INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P') SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val) FROM @tempTable t LEFT JOIN ( SELECT ID, Val FROM @tempTable WHERE Val = 'Y' ) t2 ON t.ID = t2.ID LEFT JOIN ( SELECT ID, Val FROM @tempTable WHERE Val = 'N' ) t3 ON t.ID = t3.ID LEFT JOIN ( SELECT ID, Val FROM @tempTable WHERE Val <> 'Y' AND Val <> 'N' ) t4 ON t.ID = t4.ID

Thanks in advance.

+4

sql tsql aggregate sql-server-2008-r2

eSamuel Aug 25 '11 at 23:19

source share

4 answers

I would modify this to make reading easier:

 SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val) FROM @tempTable t LEFT JOIN @tempTable t2 ON t.ID = t2.ID and t2.Val = 'Y' LEFT JOIN @tempTable t3 ON t.ID = t3.ID and t3.Val = 'N' LEFT JOIN @tempTable t4 ON t.ID = t4.ID and t4.Val <> 'Y' AND t4.Val <> 'N'

Gets the same results as your example.

I also reviewed execution plans for both, and they looked exactly the same, I doubt you will see a difference in performance.

+3

Abe miessler Aug 25 '11 at 23:25

source share

Try the following:

 ;WITH a AS ( SELECT ID, SUM(CASE Val WHEN 'Y' THEN 1 ELSE 0 END) AS y, SUM(CASE Val WHEN 'N' THEN 0 ELSE 1 END) AS n, MIN(CASE WHEN Val IN ('Y','N') THEN NULL ELSE Val END) AS first_other FROM @tempTable GROUP BY ID ) SELECT ID, CASE WHEN y > 0 THEN 'Y' WHEN n = 0 THEN 'N' ELSE first_other END AS Val FROM a

If there are "Y" values, then the sum of y will be greater than 0
If all values are "N", then the sum of n will be zero
If necessary, get the first character "Y" or "N"
In this case, the result can be determined with only one pass through the table.

+3

8kb Aug 26 '11 at 6:26

source share

I read your specification as follows:

if any identifier is Y then Y
if all identifiers are N, then N
other display value (except Y or N)

exclude rows in (1)

 delete from @tempTable where not Val='Y' and ID in ( select distinct ID from @tempTable where Val='Y' )

select single to eliminate multiple N on (2).

 select distinct * from @tempTable

group different "other" values to get one row per identifier.

 SELECT A.Id, AllVals = SubString( (SELECT ', ' + B.Val FROM C as B WHERE A.Id = B.Id FOR XML PATH ( '' ) ), 3, 1000) FROM C as A GROUP BY Id

All executed request:

 declare @tempTable table (ID int, Val char(1)) INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F') INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P') INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'F') delete from @tempTable where not Val='Y' and ID in ( select distinct ID from @tempTable where Val='Y' ); WITH C as (select distinct * from @tempTable) SELECT A.Id, AllVals = SubString( (SELECT ', ' + B.Val FROM C as B WHERE A.Id = B.Id FOR XML PATH ( '' ) ), 3, 1000) FROM C as A GROUP BY Id

OUTPUT:

 Id AllVals 10 Y 11 N 12 Y 13 N 14 Y 15 Y 16 Y 17 F 18 F, P

+2

Aaron anodide Aug 26 '11 at 2:16

source share

Andrew Lazarus · Accepted Answer · 2011-08-26T00:57:59+0000

Let me answer an easier task: for each id, get Val, which is the last in the alphabet. This will work if Y and N are single values. And the request is much simpler:

 SELECT t.ID, MAX(t.Val) FROM t GROUP BY t.ID;

So, reduce your case to a simple case. Use enum (if your database supports it) or put the value codes in another table with a sort column (in this case you can have 1 for Y, 2 for N and 999 for all other possible values, and you want a small one). Then

 SELECT ID, c.Val FROM (SELECT t.ID, MIN(codes.collation) AS mx FROM t join codes on t.Val = codes.Val GROUP BY t.ID) AS q JOIN codes c ON mx=c.collation;

The codes here have two columns: Val and Collation.

You can also do this with a query like CTE, if you have values ordered as you wish. This approach has one join with a small lookup table and should be much, much faster than 3 separate joins.

 WITH q AS (SELECT t.id, t.Val, ROW_NUMBER() AS r FROM t JOIN codes ON t.Val=codes.Val PARTITION BY t.id ORDER BY codes.collation) SELECT q.id, q.Val WHERE r=1;

How can I improve this SQL query?

More articles: