Linq - how do you execute a query for elements in one query source that are not in another?

Question

Linq - how do you execute a query for elements in one query source that are not in another?

If I have 2 sources of queries, how can I find those that are in one that are not in the other?

join example to search for elements in both:

var results = from item1 in qs1.Items join item2 in qs2 on item1.field1 equals item2.field2 select item1;

So, what would get the linq code to return elements in qs1 that are not in qs2?

+4

c # .net linq .net-3.5

Carlton Jenke Sep 08 '08 at 21:10

source share

5 answers

From Marco Russo

 NorthwindDataContext dc = new NorthwindDataContext(); dc.Log = Console.Out; var query = from c in dc.Customers where !(from o in dc.Orders select o.CustomerID) .Contains(c.CustomerID) select c; foreach (var c in query) Console.WriteLine( c );

+4

Bramha ghosh Sep 08 '08 at 21:21

source share

use the Except extension method.

 var items1 = new List<string> { "Apple","Orange","Banana" }; var items2 = new List<string> { "Grapes","Apple","Kiwi" }; var excluded = items1.Except(items2);

+4

Darren Kopp Sep 08 '08 at 21:23

source share

Another completely different way to look at this would be to pass the lambda expression (a condition to populate the second collection) as a predicate to the first collection.

I know this is not an exact answer to the question. I think other users have already given the correct answer.

+1

Gulzar nazim Sep 08 '08 at 21:34

source share

Here's a simpler version of the same, you don't need to embed a query:

 List<string> items1 = new List<string>(); items1.Add("cake"); items1.Add("cookie"); items1.Add("pizza"); List<string> items2 = new List<string>(); items2.Add("pasta"); items2.Add("pizza"); var results = from item in items1 where items2.Contains(item) select item; foreach (var item in results) Console.WriteLine(item); //Prints 'pizza'

0

Nidonocu Sep 08 '08 at 21:25

source share

Matt mitchell · Accepted Answer · 2009-05-05T05:25:46+0000

Darren Kopp answer :

 var excluded = items1.Except(items2);

- The best solution in terms of performance.

(NB: this is true for at least regular LINQ, maybe LINQ to SQL changes things according to a blog post by Marco Russo . "Imagine that in the" worst case "Darren Kopp's method will return at least the speed of the Russian method even in LINQ to SQL) .

As a quick example, try this in LINQPad :

 void Main() { Random rand = new Random(); int n = 100000; var randomSeq = Enumerable.Repeat(0, n).Select(i => rand.Next()); var randomFilter = Enumerable.Repeat(0, n).Select(i => rand.Next()); /* Method 1: Bramha Ghosh's/Marco Russo method */ (from el1 in randomSeq where !(from el2 in randomFilter select el2).Contains(el1) select el1).Dump("Result"); /* Method 2: Darren Kopp method */ randomSeq.Except(randomFilter).Dump("Result"); }

Try commenting on one of the two methods at a time and try performance for different n values.

My experience (on my Core 2 Duo laptop) seems to suggest:

 n = 100. Method 1 takes about 0.05 seconds, Method 2 takes about 0.05 seconds n = 1,000. Method 1 takes about 0.6 seconds, Method 2 takes about 0.4 seconds n = 10,000. Method 1 takes about 2.5 seconds, Method 2 takes about 0.425 seconds n = 100,000. Method 1 takes about 20 seconds, Method 2 takes about 0.45 seconds n = 1,000,000. Method 1 takes about 3 minutes 25 seconds, Method 2 takes about 1.3 seconds

Method 2 (Darren Kopp's answer) is clearly faster.

The slowdown for method 2 for larger n is most likely due to the generation of random data (feel free to insert a DateTime diff to confirm this), while method 1 clearly has problems with algorithmic complexity (and just looking at you can see that it at least O (N ^ 2), since for each number in the first collection it is compared with the entire second collection).

Conclusion: Use Darren Kopp's answer to the LINQ 'Except' method

Linq - how do you execute a query for elements in one query source that are not in another?

More articles: