Cover Request

Good afternoon, I would like to implement a T-SQL query to fix the coverage problem , but could not find any hints on how to do this in SQL.

In my case, my table has only two columns ( IDnumber and Mut ), and I want to find the minimum number of IDnumber to get one of all Mut . I really want to get three IDnumbers for each Mut , but I decided it was best to start with one, because it could be easier.

 DECLARE @myTable TABLE ( IDnumber int, Mut varchar(1)) INSERT @myTable VALUES (1,'C'), (1,'N'), (1,'Z'), (1,'M'), (1,'E'), (2,'E'), (3,'B'), (3,'N'), (3,'D'), (3,'K'), (3,'W'), (4,'O'), (4,'G'), (4,'N'), (4,'B'), (4,'U'), (4,'C'), (5,'Q'), (5,'H'), (6,'K'), (6,'Y'), (6,'M'), (6,'A'), (6,'O'), (6,'U'), (6,'J'), (7,'H'), (7,'U'), (7,'M'), (7,'L'), (8,'B'), (8,'K'), (8,'P'), (9,'Y'), (9,'K'), (10,'Z'), (11,'R'), (12,'X'), (12,'R'), (12,'O'), (12,'Z'), (4,'C'), (1,'Z'), (4,'S'), (6,'E'), (5,'G'), (4,'C'), (4,'S'), (4,'H'), (6,'D'), (7,'W'), (3,'U'), (6,'N'), (7,'Y'), (6,'N'), (6,'F'), (4,'C'), (4,'I'), (7,'P'), (10,'H'), (10,'Z'), (10,'S'), (7,'Z'), (6,'B'), (7,'Z'), (8,'X'), (8,'J'), (8,'P'), (10,'K'), (8,'K'), (12,'P'), (8,'W'), (10,'M'), (12,'F'), (9,'T'), (9,'D'), (14,'Y'), (12,'P'), (14,'J'), (13,'D'), (15,'H'), (12,'J'), (6,'H'), (2,'Z'), (8,'G'), (10,'Q'), (6,'D'), (5,'X'), (9,'T'), (6,'W'), (6,'K'), (10,'W'), (7,'J'), (11,'W'), (12,'V'), (9,'F'), (7,'F'), (4,'M'), (5,'K'), (12,'G'), (12,'T'), (15,'T'), (13,'W'), (7,'J'), (9,'T'), (10,'U'), (9,'S'), (10,'L'), (10,'J'), (10,'H'), (11,'H'), (12,'S'), (12,'A'), (14,'L'), (13,'K'), (13,'D'), (4,'M'), (3,'N'), (4,'F'), (7,'M'), (7,'V'), (5,'R'), (4,'K'), (5,'F'), (7,'G'), (8,'M'), (4,'X'), (7,'F'), (9,'S'), (7,'N'), (6,'W'), (6,'W'), (5,'S'), (9,'Z'), (10,'I'), (11,'Y'), (11,'D'), (9,'X'), (7,'G'), (9,'S'), (9,'H'), (9,'T'), (8,'J'), (10,'U'), (9,'F'), (9,'S'), (7,'D'), (14,'R'), (10,'F'), (7,'E'), (15,'M'), (12,'F'), (5,'C'), (8,'E'), (16,'G'), (11,'V'), (10,'I'), (12,'I'), (11,'Y'), (12,'I'), (14,'J'), (15,'D'), (19,'J'), (16,'B'), (12,'G'), (9,'J'), (18,'J'), (18,'C'), (16,'Q'), (18,'P'), (13,'F'), (19,'T'), (15,'J'), (15,'R'), (15,'Q'), (15,'O'), (11,'A'), (24,'B'), (19,'S'), (22,'I'), (15,'X'), (20,'T'), (15,'E'), (9,'V'), (8,'H'), (16,'N'), (17,'H') -- Since the above list was generated by a bunch of random numbers/letters I need to -- delete the duplicates ;WITH cte AS ( SELECT IDnumber, mut, row_number() OVER(PARTITION BY IDNumber, mut ORDER BY IDNumber) AS [rn] FROM @myTable ) DELETE cte WHERE [rn] > 1 SELECT * FROM ( SELECT IDnumber, Mut FROM @myTable) AS S PIVOT (COUNT(Mut) FOR mut IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L],[M],[N],[O],[P], [Q],[R],[S],[T],[U],[V],[W],[X],[Y],[Z])) AS pvt 

So you can see from the pivot table that the minimum IDnumbers will be 3,5,7 and 12.

How can the algorithm be implemented? It seems to me that I could find all the combinations (2 ^ 6) that would be sets, and then determine which sets all Mutas have. Then the set with the smallest number of IDnumbers is minimal.

Such brute force may work, but it would be terribly inefficient. My real world is not huge, I have 43 unique Muts (not nine in the example) and ~ 2000 IDnumbers , but I think it will take some time to start, because 2 ^ 2000 is pretty damn big ...

Thanks!

+5
source share
2 answers

Here's an approach that computes a bitmask of mut values ​​for each IDNumber , then uses a recursive CTE to generate all possible combinations that fill the set. The code displays all possible IDNumber combinations that give the final result, excluding those in which one or more IDNumber are redundant in the combination (for example, 1,2,3,4,5,6 in the sample data set).

To convert this to work with your real data, the main difference is that you will almost certainly need to find another mechanism for assigning bit values ​​to Mut values. (I can cheat a little here and calculate the bit values ​​by manipulating the decimal ASCII code for each letter Mut - POWER(2,ASCII(Mut) - 64) ).

 DECLARE @myTable TABLE ( IDnumber int, Mut varchar(1)) INSERT @myTable VALUES (1,'A'),(1,'B'),(1,'C'),(1,'D'), (2,'A'),(2,'C'),(2,'D'), (3,'A'),(3,'B'),(3,'F'),(3,'Z'), (4,'A'),(4,'B'),(4,'E'),(4,'F'), (5,'Y'), (6,'X'),(6,'Z') -- calculate the target bitmask DECLARE @target bigint = ( SELECT SUM(POWER(2,ASCII(Mut) - 64)) FROM (SELECT DISTINCT Mut FROM @myTable) AS x ) ;WITH baseCTE AS ( --calculate a starting bitmask for each ID SELECT IDnumber, sum(mutbit) AS bitmask FROM (SELECT DISTINCT IDnumber, CAST(POWER(2,ASCII(Mut) - 64) AS bigint) AS mutbit FROM @myTable) AS x GROUP BY IDnumber ) ,recCTE AS ( SELECT IDnumber, bitmask, CAST(IdNumber AS varchar(max)) AS trail, 1 AS lev FROM baseCTE UNION ALL SELECT b.IDnumber, b.bitmask | r.bitmask, CAST(CONCAT(r.trail,',',b.IDnumber) AS varchar(max)), r.lev + 1 FROM recCTE AS r JOIN baseCTE AS b ON b.IDnumber > r.IDnumber WHERE b.bitmask | r.bitmask <> r.bitmask --ignore combinations which don't add any new Mut values ) SELECT trail, lev FROM recCTE WHERE bitmask = @target ORDER BY lev 

NB. The SQL Server bitwise operators only work where one or the other value is an integer type, so this solution is limited to 64 different Mut values ​​(where the mask is bigint )) to make it work outside it, d you need to use your own bitwise comparator (maybe using CLR). However, since the question mentions that you have 43 Mut values, it may work for you at the moment.

+1
source

I would like a larger dataset to be tested, but that matches your results, at least for a given dataset ...

 DECLARE @myTable TABLE ( IDnumber INT, Mut VARCHAR(1)) DECLARE @results TABLE ( IDnumber INT) INSERT @myTable VALUES (1,'A'),(1,'B'),(1,'C'),(1,'D'), (2,'A'),(2,'C'),(2,'D'), (3,'A'),(3,'B'),(3,'F'),(3,'Z'), (4,'A'),(4,'B'),(4,'E'),(4,'F'), (5,'Y'), (6,'X'),(6,'Z') DECLARE @IDnumber INT WHILE EXISTS (SELECT 1 FROM @myTable) BEGIN WITH MutRank AS ( -- Find how many IDNumbers are associated with each mut SELECT Mut, COUNT(DISTINCT IDnumber) AS IDnumberCnt FROM @myTable GROUP BY Mut ), MutRarity AS ( -- Translate the IDNumberCnt into a weighted rarity so dupes at -- a IDNumberCnt level reduce their rarity compared to the next level SELECT Mut, RANK() OVER (ORDER BY IDnumberCnt DESC) AS MutRarity FROM MutRank ) -- Grab the IDNumber that is associated to the most "rare" muts SELECT @IDnumber = n.IDnumber FROM (SELECT TOP 1 m.IDnumber, SUM(MutRarity) AS IDNumberValue FROM @myTable m JOIN MutRarity mr ON m.Mut = mr.Mut GROUP BY m.IDnumber ORDER BY IDNumberValue DESC, IDnumber) n -- Save the number in our results INSERT @results (IDnumber) SELECT @IDnumber -- Purge all records for the "covered" muts and the selected IDNumber DELETE m2 FROM @myTable m1 JOIN @myTable m2 ON m1.Mut = m2.Mut AND m1.IDnumber = @IDnumber END SELECT * FROM @results ORDER BY IDnumber 
+2
source

Source: https://habr.com/ru/post/1258920/


All Articles