T-SQL - Get a list of all As that have the same set of Bs

Question

T-SQL - Get a list of all As that have the same set of Bs

I am struggling with a complex SQL query that I am trying to write. Look at the following table:

+---+---+ | A | B | +---+---+ | 1 | 2 | | 1 | 3 | | 2 | 2 | | 2 | 3 | | 2 | 4 | | 3 | 2 | | 3 | 3 | | 4 | 2 | | 4 | 3 | | 4 | 4 | +---+---+

Now from this table I essentially need a list of all As that have the same set of Bs and each set assigns an incremental identifier.

Therefore, the output set for the above is:

 +---+----+ | A | ID | +---+----+ | 1 | 1 | | 3 | 1 | | 2 | 2 | | 4 | 2 | +---+----+

Thanks.

Edit: if this helps, I have a list of all the different B values that are possible in another table.

Edit: Thank you so much for all the innovative answers. I could learn a lot.

+6

sql tsql

Karthik iyengar Jun 06 '15 at 7:40

source share

6 answers

Something like that:

 select a, dense_rank() over (order by g) as id_b from ( select a, (select b from MyTable s where sa=aa order by b FOR XML PATH('')) g from MyTable a group by a ) a order by id_b,a

Or perhaps using CTE (I avoid them when possible)

Sql fiddle

As a side note, this is the output of an internal query using the sample data in question:

 ag 1 <b>2</b><b>3</b> 2 <b>2</b><b>3</b><b>4</b> 3 <b>2</b><b>3</b> 4 <b>2</b><b>3</b><b>4</b>

+3

edc65 Jun 06 '15 at 9:11

source share

EDIT I am changing the code, but now it will become larger, used Merge several lines into one text line? to concatenate strings

 Select [A], Left(M.[C],Len(M.[C])-1) As [D] into #tempSomeTable From ( Select distinct T2.[A], ( Select Cast(T1.[B] as VARCHAR) + ',' AS [text()] From sometable T1 Where T1.[A] = T2.[A] ORDER BY T1.[A] For XML PATH ('') ) [C] From sometable T2 )M SELECT tA, DENSE_RANK() OVER(ORDER BY t.[D]) [ID] FROM #tempSomeTable t inner join (SELECT [D] FROM( SELECT [D], COUNT([A]) [D_A] from #tempSomeTable t GROUP BY [D] )P where [C_A]>1)t1 on t1.[D]=t.[D]

+2

debatanu Jun 06 '15 at 8:29

source share

Here's a long suggestive approach, finding sets with the same elements (using EXCEPT bi-directionally to eliminate, and just done a semi-diagonal Cartesian product), then matching the same settings, flashing each pair with ROW_NUMBER() before splitting A's pairs into your final output, where equivalent sets are projected as strings that have the same id .

 WITH joinedSets AS ( SELECT t1.A as t1A, t2.A AS t2A FROM MyTable t1 INNER JOIN MyTable t2 ON t1.B = t2.B AND t1.A < t2.A ), equalSets AS ( SELECT js.t1A, js.t2A, ROW_NUMBER() OVER (ORDER BY js.t1A) AS Id FROM joinedSets js GROUP BY js.t1A, js.t2A HAVING NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A) EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A)) AND NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A) EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A)) ) SELECT A, Id FROM equalSets UNPIVOT ( A FOR ACol in (t1A, t2A) ) unp;

SqlFiddle here

In its current form, this solution will work only with pairs of sets, and not with triples, etc. Perhaps there is a general solution like NTuple (but outside my brain right now).

+2

Stuartlc Jun 06 '15 at 8:44

source share

Here is a very simple, quick, but approximate solution. It is possible that CHECKSUM_AGG returns the same checksum for different sets of B.

 DECLARE @T TABLE (A int, B int); INSERT INTO @T VALUES (1, 2),(1, 3),(2, 2),(2, 3),(2, 4),(3, 2),(3, 3),(4, 2),(4, 3),(4, 4); SELECT A ,CHECKSUM_AGG(B) AS CheckSumB ,ROW_NUMBER() OVER (PARTITION BY CHECKSUM_AGG(B) ORDER BY A) AS GroupNumber FROM @T GROUP BY A ORDER BY A, GroupNumber;

Result set

 A CheckSumB GroupNumber ----------------------------- 1 1 1 2 5 1 3 1 2 4 5 2

For an exact solution group by A and combine all the values of B into a long (binary) string using either FOR XML, CLR, or the T-SQL function. You can then split ROW_NUMBER into this concatenated string to assign numbers to groups. As shown in other answers.

+2

Vladimir Baranov Jun 06 '15 at 10:00

source share

Here is an exact, not an approximate solution. It does not use anything more advanced than INNER JOIN and GROUP BY (and, of course, DENSE_RANK () to get the desired ID).

It is also general because it allows you to repeat the values of B within group A.

 SELECT A, DENSE_RANK() OVER (ORDER BY MIN_EQUIVALENT_A) AS ID FROM ( SELECT MATCHES.A1 AS A, MIN(MATCHES.A2) AS MIN_EQUIVALENT_A FROM ( SELECT T1.A AS A1, T2.A AS A2, COUNT(*) AS NUM_B_VALS_MATCHED FROM ( SELECT A, B, COUNT(*) AS B_VAL_FREQ FROM MyTable GROUP BY A, B ) AS T1 INNER JOIN ( SELECT A, B, COUNT(*) AS B_VAL_FREQ FROM MyTable GROUP BY A, B ) AS T2 ON T1.B = T2.B AND T1.B_VAL_FREQ = T2.B_VAL_FREQ GROUP BY T1.A, T2.A ) AS MATCHES INNER JOIN ( SELECT A, COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL FROM MyTable GROUP BY A ) AS CHECK_TOTALS_A1 ON MATCHES.A1 = CHECK_TOTALS_A1.A AND MATCHES.NUM_B_VALS_MATCHED = CHECK_TOTALS_A1.NUM_B_VALS_TOTAL INNER JOIN ( SELECT A, COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL FROM MyTable GROUP BY A ) AS CHECK_TOTALS_A2 ON MATCHES.A2 = CHECK_TOTALS_A2.A AND MATCHES.NUM_B_VALS_MATCHED = CHECK_TOTALS_A2.NUM_B_VALS_TOTAL GROUP BY MATCHES.A1 ) AS EQUIVALENCE_TABLE ORDER BY 2,1 ;

0

Slowmagic Jun 27 '15 at 21:25

source share

Giorgi nakeuri · Accepted Answer · 2015-06-06T09:04:07+0000

Here's a math trick to solve your difficult choice:

 with pow as(select *, b * power(10, row_number() over(partition by a order by b)) as rn from t) select a, dense_rank() over( order by sum(rn)) as rn from pow group by a order by rn, a

Fiddle http://sqlfiddle.com/#!3/6b98d/11

This, of course, will only work for a limited number of counters, since you will get an overflow. Here is a more general solution with strings:

 select a, dense_rank() over(order by (select '.' + cast(b as varchar(max)) from t t2 where t1.a = t2.a order by b for xml path(''))) rn from t t1 group by a order by rn, a

Fiddle http://sqlfiddle.com/#!3/6b98d/29

T-SQL - Get a list of all As that have the same set of Bs

More articles: