Here is an option that does not use a cursor but uses a single recursive query.
Essentially, it processes the data as edges in a graph and recursively recursively recursively relays all edges of the graph, stopping when a loop is detected. Then he puts all the loops found into groups and gives each group a number.
See detailed explanations of how this works below. I recommend that you run a CTE-by-CTE request and examine each intermediate result to understand what it is doing.
Example 1
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1)); INSERT INTO @T (ID, Ident1, Ident2) VALUES (1, 'a', 'a'), (2, 'b', 'b'), (3, 'c', 'a'), (4, 'c', 'b'), (5, 'c', 'c');
Example 2
I added another row with z value to have multiple rows with unpaired values.
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1)); INSERT INTO @T (ID, Ident1, Ident2) VALUES (1, 'a', 'a'), (1, 'a', 'c'), (2, 'b', 'f'), (3, 'a', 'g'), (4, 'c', 'h'), (5, 'b', 'j'), (6, 'd', 'f'), (7, 'e', 'k'), (8, 'i', NULL), (88, 'z', 'z'), (9, 'l', 'h');
Example 3
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1)); INSERT INTO @T (ID, Ident1, Ident2) VALUES (1, 'a', 'f'), (2, 'a', 'g'), (3, 'a', NULL), (4, 'b', 'c'), (5, 'b', 'a'), (6, 'b', 'h'), (7, 'b', 'j'), (8, 'b', NULL), (9, 'b', NULL), (10, 'b', 'g'), (11, 'c', 'k'), (12, 'c', 'b'), (13, 'd', 'l'), (14, 'd', 'f'), (15, 'd', 'g'), (16, 'd', 'm'), (17, 'd', 'a'), (18, 'd', NULL), (19, 'd', 'a'), (20, 'e', 'c'), (21, 'e', 'b'), (22, 'e', NULL);
Query
WITH CTE_Idents AS ( SELECT Ident1 AS Ident FROM @T UNION SELECT Ident2 AS Ident FROM @T ) ,CTE_Pairs AS ( SELECT Ident1, Ident2 FROM @T WHERE Ident1 <> Ident2 UNION SELECT Ident2 AS Ident1, Ident1 AS Ident2 FROM @T WHERE Ident1 <> Ident2 ) ,CTE_Recursive AS ( SELECT CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent , Ident1 , Ident2 , CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath , 1 AS Lvl FROM CTE_Pairs INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1 UNION ALL SELECT CTE_Recursive.AnchorIdent , CTE_Pairs.Ident1 , CTE_Pairs.Ident2 , CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath , CTE_Recursive.Lvl + 1 AS Lvl FROM CTE_Pairs INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1 WHERE CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000)) ) ,CTE_RecursionResult AS ( SELECT AnchorIdent, Ident1, Ident2 FROM CTE_Recursive ) ,CTE_CleanResult AS ( SELECT AnchorIdent, Ident1 AS Ident FROM CTE_RecursionResult UNION SELECT AnchorIdent, Ident2 AS Ident FROM CTE_RecursionResult ) SELECT CTE_Idents.Ident ,CASE WHEN CA_Data.XML_Value IS NULL THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers ,DENSE_RANK() OVER(ORDER BY CASE WHEN CA_Data.XML_Value IS NULL THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END ) AS GroupID FROM CTE_Idents CROSS APPLY ( SELECT CTE_CleanResult.Ident+',' FROM CTE_CleanResult WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE ) AS CA_XML(XML_Value) CROSS APPLY ( SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)') ) AS CA_Data(XML_Value) WHERE CTE_Idents.Ident IS NOT NULL ORDER BY Ident;
Result 1
+-------+--------------+---------+ | Ident | GroupMembers | GroupID | +-------+--------------+---------+ | a | a,b,c, | 1 | | b | a,b,c, | 1 | | c | a,b,c, | 1 | +-------+--------------+---------+
Result 2
+-------+--------------+---------+ | Ident | GroupMembers | GroupID | +-------+--------------+---------+ | a | a,c,g,h,l, | 1 | | b | b,d,f,j, | 2 | | c | a,c,g,h,l, | 1 | | d | b,d,f,j, | 2 | | e | e,k, | 3 | | f | b,d,f,j, | 2 | | g | a,c,g,h,l, | 1 | | h | a,c,g,h,l, | 1 | | i | i | 4 | | j | b,d,f,j, | 2 | | k | e,k, | 3 | | l | a,c,g,h,l, | 1 | | z | z | 5 | +-------+--------------+---------+
Result 3
+-------+--------------------------+---------+ | Ident | GroupMembers | GroupID | +-------+--------------------------+---------+ | a | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | b | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | c | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | d | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | e | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | f | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | g | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | h | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | j | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | k | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | l | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | | m | a,b,c,d,e,f,g,h,j,k,l,m, | 1 | +-------+--------------------------+---------+
How it works
I am using the second sample dataset for this explanation.
CTE_Idents
CTE_Idents contains a list of all Identifiers that appear in the columns Ident1 and Ident2 . Since they can appear in any order, we will be together both columns. UNION also removes any duplicates.
+-------+ | Ident | +-------+ | NULL | | a | | b | | c | | d | | e | | f | | g | | h | | i | | j | | k | | l | | z | +-------+
CTE_Pairs
CTE_Pairs gives a list of all edges of the graph in both directions. Again, UNION used to remove any duplicates.
+--------+--------+ | Ident1 | Ident2 | +--------+--------+ | a | c | | a | g | | b | f | | b | j | | c | a | | c | h | | d | f | | e | k | | f | b | | f | d | | g | a | | h | c | | h | l | | j | b | | k | e | | l | h | +--------+--------+
CTE_Recursive
CTE_Recursive - the main part of the query that recursively traverses the graph, starting with each unique Identifier. These start lines are created by the first part of UNION ALL . The second part of UNION ALL recursively connects to itself, linking Ident2 to Ident1 . Since we previously created CTE_Pairs with all edges written in both directions, we can always associate only Ident2 with Ident1 , and we will get all the paths in the graph. At the same time, the query builds IdentPath , a string of comma-separated identifiers that have been passed so far. It is used in the WHERE filter:
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
As soon as we encounter an identifier that was previously included in Path, recursion stops when the list of connected nodes is exhausted. AnchorIdent is the initial recursion identifier; it will be used later to group the results. Lvl is not actually used, I turned it on to better understand what is happening.
+-------------+--------+--------+-------------+-----+ | AnchorIdent | Ident1 | Ident2 | IdentPath | Lvl | +-------------+--------+--------+-------------+-----+ | a | a | c | ,a,c, | 1 | | a | a | g | ,a,g, | 1 | | b | b | f | ,b,f, | 1 | | b | b | j | ,b,j, | 1 | | c | c | a | ,c,a, | 1 | | c | c | h | ,c,h, | 1 | | d | d | f | ,d,f, | 1 | | e | e | k | ,e,k, | 1 | | f | f | b | ,f,b, | 1 | | f | f | d | ,f,d, | 1 | | g | g | a | ,g,a, | 1 | | h | h | c | ,h,c, | 1 | | h | h | l | ,h,l, | 1 | | j | j | b | ,j,b, | 1 | | k | k | e | ,k,e, | 1 | | l | l | h | ,l,h, | 1 | | l | h | c | ,l,h,c, | 2 | | l | c | a | ,l,h,c,a, | 3 | | l | a | g | ,l,h,c,a,g, | 4 | | j | b | f | ,j,b,f, | 2 | | j | f | d | ,j,b,f,d, | 3 | | h | c | a | ,h,c,a, | 2 | | h | a | g | ,h,c,a,g, | 3 | | g | a | c | ,g,a,c, | 2 | | g | c | h | ,g,a,c,h, | 3 | | g | h | l | ,g,a,c,h,l, | 4 | | f | b | j | ,f,b,j, | 2 | | d | f | b | ,d,f,b, | 2 | | d | b | j | ,d,f,b,j, | 3 | | c | h | l | ,c,h,l, | 2 | | c | a | g | ,c,a,g, | 2 | | b | f | d | ,b,f,d, | 2 | | a | c | h | ,a,c,h, | 2 | | a | h | l | ,a,c,h,l, | 3 | +-------------+--------+--------+-------------+-----+
CTE_CleanResult
CTE_CleanResult leaves only the relevant parts from CTE_Recursive and reunites both Ident1 and Ident2 using UNION .
+-------------+-------+ | AnchorIdent | Ident | +-------------+-------+ | a | a | | a | c | | a | g | | a | h | | a | l | | b | b | | b | d | | b | f | | b | j | | c | a | | c | c | | c | g | | c | h | | c | l | | d | b | | d | d | | d | f | | d | j | | e | e | | e | k | | f | b | | f | d | | f | f | | f | j | | g | a | | g | c | | g | g | | g | h | | g | l | | h | a | | h | c | | h | g | | h | h | | h | l | | j | b | | j | d | | j | f | | j | j | | k | e | | k | k | | l | a | | l | c | | l | g | | l | h | | l | l | +-------------+-------+
Final SELECT
Now we need to build a comma-separated string of Ident values ββfor each AnchorIdent . CROSS APPLY with FOR XML does this. DENSE_RANK() calculates GroupID numbers for each AnchorIdent .