Linq Performance: Any vs. Contains

Question

Linq Performance: Any vs. Contains

This question is related to this , but not quite the same as me.

Given:

class Foo { public string Bar { get; set; } } ... var c1 = new List<Foo>() { ... }; var c2 = new List<Foo>() { ... };

The following 2 cycles give the same result:

  foreach (var item in c2.Where(f => c1.Any(f1 => f1.Bar.Equals(f.Bar)))) { ... } foreach (var item in c2.Where(f => c1.Select(f1 => f1.Bar).Contains(f.Bar))) { ... }

Are they equally fast?

The difference with another question is whether the optional Select statement adds importance to the importance of the main collection.

In other words: does it contain:

 foos.Contains(foo1)

act on the same “collection view” as this one:

 foos.Select(f=>f.Bar).Contains(foo1.Bar)

My possible -naive-thought event may be: "As soon as we lag behind Linq Select, everything is just“ Lists ”, therefore“ Any ”and“ Contains "O (n)."

+4

performance c # linq

jan Jun 25 '13 at 16:00

source share

2 answers

Your first approach will repeat and compare once and return results.

The second query will be slower because it will iterate and retrieve the Bar property into the collection, and then repeat and compare with f.Bar to create the final result.

0

Rob hardy Jun 25 '13 at 16:09

source share

Servy · Accepted Answer · 2013-06-25T16:24:23+0000

Two queries basically implement the same algorithm. Each of them will iterate c1 for each element in c2 , compare the Bar properties of two objects and return as soon as a match is found. The asymptotic complexity of both cases is the same, which means that they will scale equally well (or, if necessary, equally bad) as the size of these two sets increases. There may be slight differences between them in the overhead associated with one method over another, but the difference will not be significant, and they will be smaller and smaller as the size of the collections increases. There is no real reason to choose one of the two over the other.

There is an option that you have not shown that it is a little faster than any of them. You can use Join to find all the elements in c1 that also exist in c2 without performing a linear search in the sequence:

 var query = from first in c1 join second in c2 on first.Bar equals second.Bar select first;

Another option is to use a HashSet instead of List , as it can be a lot easier to find:

 var set = new HashSet<string>(c1.Select(item => item.Bar)); var query = c2.Where(item => set.Contains(item.Bar));

(This decision is pretty close to what Join will do internally.)

Both of these solutions will be much faster than any of the solutions you offer.

Linq Performance: Any vs. Contains

More articles: