How can I improve this SQL query?

Today I ran into an interesting SQL problem, and while I came up with a solution that works, I doubt that this is the best or most efficient answer. I belong to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, the query is part of an SSRS report that will run against about 100,000 rows.

Essentially, I have a list of identifiers that can have multiple values ​​associated with them, the values ​​are Yes, No, or some other string. For ID x, if any of the values ​​is Yes, x should be Yes, if all of them are No, it should be No, if they contain any other values, but yes and no, display this value. I only want to return 1 row to ID, without duplicates.

Simplified version and test case:

DECLARE @tempTable table ( ID int, Val varchar(1) ) INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F') INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P') SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val) FROM @tempTable t LEFT JOIN ( SELECT ID, Val FROM @tempTable WHERE Val = 'Y' ) t2 ON t.ID = t2.ID LEFT JOIN ( SELECT ID, Val FROM @tempTable WHERE Val = 'N' ) t3 ON t.ID = t3.ID LEFT JOIN ( SELECT ID, Val FROM @tempTable WHERE Val <> 'Y' AND Val <> 'N' ) t4 ON t.ID = t4.ID 

Thanks in advance.

+4
source share
4 answers

Let me answer an easier task: for each id, get Val, which is the last in the alphabet. This will work if Y and N are single values. And the request is much simpler:

 SELECT t.ID, MAX(t.Val) FROM t GROUP BY t.ID; 

So, reduce your case to a simple case. Use enum (if your database supports it) or put the value codes in another table with a sort column (in this case you can have 1 for Y, 2 for N and 999 for all other possible values, and you want a small one). Then

 SELECT ID, c.Val FROM (SELECT t.ID, MIN(codes.collation) AS mx FROM t join codes on t.Val = codes.Val GROUP BY t.ID) AS q JOIN codes c ON mx=c.collation; 

The codes here have two columns: Val and Collation.

You can also do this with a query like CTE, if you have values ​​ordered as you wish. This approach has one join with a small lookup table and should be much, much faster than 3 separate joins.

 WITH q AS (SELECT t.id, t.Val, ROW_NUMBER() AS r FROM t JOIN codes ON t.Val=codes.Val PARTITION BY t.id ORDER BY codes.collation) SELECT q.id, q.Val WHERE r=1; 
+4
source

I would modify this to make reading easier:

 SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val) FROM @tempTable t LEFT JOIN @tempTable t2 ON t.ID = t2.ID and t2.Val = 'Y' LEFT JOIN @tempTable t3 ON t.ID = t3.ID and t3.Val = 'N' LEFT JOIN @tempTable t4 ON t.ID = t4.ID and t4.Val <> 'Y' AND t4.Val <> 'N' 

Gets the same results as your example.

I also reviewed execution plans for both, and they looked exactly the same, I doubt you will see a difference in performance.

+3
source

Try the following:

 ;WITH a AS ( SELECT ID, SUM(CASE Val WHEN 'Y' THEN 1 ELSE 0 END) AS y, SUM(CASE Val WHEN 'N' THEN 0 ELSE 1 END) AS n, MIN(CASE WHEN Val IN ('Y','N') THEN NULL ELSE Val END) AS first_other FROM @tempTable GROUP BY ID ) SELECT ID, CASE WHEN y > 0 THEN 'Y' WHEN n = 0 THEN 'N' ELSE first_other END AS Val FROM a 
  • If there are "Y" values, then the sum of y will be greater than 0
  • If all values ​​are "N", then the sum of n will be zero
  • If necessary, get the first character "Y" or "N"
  • In this case, the result can be determined with only one pass through the table.
+3
source

I read your specification as follows:

  • if any identifier is Y then Y
  • if all identifiers are N, then N
  • other display value (except Y or N)

exclude rows in (1)

 delete from @tempTable where not Val='Y' and ID in ( select distinct ID from @tempTable where Val='Y' ) 

select single to eliminate multiple N on (2).

 select distinct * from @tempTable 

group different "other" values ​​to get one row per identifier.

 SELECT A.Id, AllVals = SubString( (SELECT ', ' + B.Val FROM C as B WHERE A.Id = B.Id FOR XML PATH ( '' ) ), 3, 1000) FROM C as A GROUP BY Id 

All executed request:

 declare @tempTable table (ID int, Val char(1)) INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N') INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y') INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F') INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P') INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'F') delete from @tempTable where not Val='Y' and ID in ( select distinct ID from @tempTable where Val='Y' ); WITH C as (select distinct * from @tempTable) SELECT A.Id, AllVals = SubString( (SELECT ', ' + B.Val FROM C as B WHERE A.Id = B.Id FOR XML PATH ( '' ) ), 3, 1000) FROM C as A GROUP BY Id 

OUTPUT:

 Id AllVals 10 Y 11 N 12 Y 13 N 14 Y 15 Y 16 Y 17 F 18 F, P 
+2
source

Source: https://habr.com/ru/post/1369066/


All Articles