How can I find a set of elements that occur in 2 or more sequences in a sequence of sequences?
In other words, I want different values to occur in at least 2 of the transmitted sequences.
Note: This is not the intersection of all sequences, but rather the union of the intersection of all pairs of sequences.
Note 2: A pair does not include a pair or two combinations of a sequence with itself. That would be stupid.
I made an attempt myself
public static IEnumerable<T> UnionOfIntersects<T>( this IEnumerable<IEnumerable<T>> source) { var pairs = from s1 in source from s2 in source select new { s1 , s2 }; var intersects = pairs .Where(p => p.s1 != p.s2) .Select(p => p.s1.Intersect(p.s2)); return intersects.SelectMany(i => i).Distinct(); }
but I am concerned that this might not be optimal, I think that it includes the intersections of pairs A, B and pairs B, A, which seem ineffective. I also think that there may be a more efficient way to join sets as they repeat.
I include a few input and output examples below:
{ { 1, 1, 2, 3, 4, 5, 7 }, { 5, 6, 7 }, { 2, 6, 7, 9 } , { 4 } }
returns
{ 2, 4, 5, 6, 7 }
and
{ { 1, 2, 3} } or { {} } or { }
returns
{ }
I am looking for the best combination of readability and potential performance.
EDIT
I did the initial testing of current answers, my code is here . The conclusion is below.
Original valid:True DoomerOneLine valid:True DoomerSqlLike valid:True Svinja valid:True Adricadar valid:True Schmelter valid:True Original 100000 iterations in 82ms DoomerOneLine 100000 iterations in 58ms DoomerSqlLike 100000 iterations in 82ms Svinja 100000 iterations in 1039ms Adricadar 100000 iterations in 879ms Schmelter 100000 iterations in 9ms
At the moment, it seems that Tim Schmelter’s answer works better, at least by an order of magnitude.