Foreach + break vs linq FirstOrDefault performance difference

I have two classes that fetch date range data on specific days.

public class IterationLookup<TItem> { private IList<Item> items = null; public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector) { this.items = items.OrderByDescending(keySelector).ToList(); } public TItem GetItem(DateTime day) { foreach(TItem i in this.items) { if (i.IsWithinRange(day)) { return i; } } return null; } } public class LinqLookup<TItem> { private IList<Item> items = null; public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector) { this.items = items.OrderByDescending(keySelector).ToList(); } public TItem GetItem(DateTime day) { return this.items.FirstOrDefault(i => i.IsWithinRange(day)); } } 

Then I do speed tests that show that the Linq version is 5 times slower . This makes sense if I store items locally without listing them using ToList . This would make it much slower, because with every call to FirstOrDefault linq would also execute OrderByDescending . But this is not so, so I do not know what is happening. Linq should look very much like an iteration.

This is a piece of code that measures my timings

 IList<RangeItem> ranges = GenerateRanges(); // returns List<T> var iterLookup = new IterationLookup<RangeItems>(ranges, r => r.Id); var linqLookup = new LinqLookup<RangeItems>(ranges, r => r.Id); Stopwatch timer = new Stopwatch(); timer.Start(); for(int i = 0; i < 1000000; i++) { iterLookup.GetItem(GetRandomDay()); } timer.Stop(); // display elapsed time timer.Restart(); for(int i = 0; i < 1000000; i++) { linqLookup.GetItem(GetRandomDay()); } timer.Stop(); // display elapsed time 

Why do I know that it should work better? Because when I write very similar code without using these search classes, Linq is very similar to foreach iteration ...

 // continue from previous code block // items used by both order as they do in classes as well IList<RangeItem> items = ranges.OrderByDescending(r => r.Id).ToList(); timer.Restart(); for(int i = 0; i < 1000000; i++) { DateTime day = GetRandomDay(); foreach(RangeItem r in items) { if (r.IsWithinRange(day)) { // RangeItem result = r; break; } } } timer.Stop(); // display elapsed time timer.Restart(); for(int i = 0; i < 1000000; i++) { DateTime day = GetRandomDay(); items.FirstOrDefault(i => i.IsWithinRange(day)); } timer.Stop(); // display elapsed time 

This, in my opinion, is very similar code. FirstOrDefault , as far as I know, iteration continues only until it reaches a valid element or until it reaches the end. And it's somehow the same as foreach with break .

But even the iteration class is worse than my simple foreach iteration loop, which is also a mystery, because all the overhead is a method call inside the class compared to direct access.

Question

What am I doing wrong in my LINQ class, which it runs very slowly?
What am I doing wrong in my Iteration class, so it performs twice as slow as a direct foreach ?

What time is measured?

I am doing the following steps:

  • Creating ranges (as shown in the results below)
  • Create object instances for IterationLookup, LinqLookup (and my optimized BitCountLookup date range class, which is not part of the discussion here)
  • Start the timer and execute 1 million requests on random days within the maximum date range (as can be seen from the results) using the previously created IterationLookup class.
  • Start the timer and execute 1 million queries on random days within the maximum date range (as can be seen from the results) using the previously created LinqLookup class.
  • Start the timer and execute 1 million requests (6 times) using manual foreach + break loops and Linq calls.

As you can see, the object is not measured .

Appendix I: Results of over a million searches

The ranges displayed in these results do not overlap, which should make both approaches even more similar if the LINQ version does not break the loop upon a successful match (which most likely does).

 Generated Ranges:

 ID Range 00000000011111111112222222222330000000001111111111222222222222
                 123456789012345678901234567890112345678901234567890123456789
 09.01.01. -30.01.  | ------- |
 08.01.01. -16.01.  | - |
 07.16.02.-19.02.  | - |
 06.01.15. -17.01.  | - |
 05 19.02.-23.02.  | --- |
 04.01.01.-07.01. | ----- |
 03 02.01.-10.01.  | ------- |
 02 11.01.-13.01.  | - |
 01 16.01.-20.01.  | --- |
 00 29.01.-06.02.  | ------- |

 Lookup classes ...

 - Iteration: 1028ms
 - Linq: 4517ms !!!  THIS IS THE PROBLEM !!!
 - BitCounter: 401ms

 Manual loops ...

 - Iter: 786ms
 - Linq: 981ms
 - Iter: 787ms
 - Linq: 996ms
 - Iter: 787ms
 - Linq: 977ms
 - Iter: 783ms
 - Linq: 979ms

Appendix II: GitHub: Gist code for checking for yourself

I set Gist so that you can get the full code yourself and see what happens. Create a Console application and copy Program.cs into it, add other files that are part of this principle.

Grab it here .

Appendix III: Final Thoughts and Measurement Tests

The most problematic was, of course, LINQ implementatino, which was terribly slow. It turns out this is due to optimizing the delegate compiler. LukeH provided the best and most usable solution that actually made me try different approaches to this. I tried various approaches in the GetItem method (or GetPointData , as it was called in Gist):

  • the usual way that most developers will do (and implemented in Gist, and also did not update after the results showed that this is not the best way to do this):

     return this.items.FirstOrDefault(item => item.IsWithinRange(day)); 
  • defining a local predicate variable:

     Func<TItem, bool> predicate = item => item.IsWithinRange(day); return this.items.FirstOrDefault(predicate); 
  • local predicate builder:

     Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d); return this.items.FirstOrDefault(builder(day)); 
  • local constructor predictor and local predicate variable:

     Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d); Func<TItem, bool> predicate = builder(day); return this.items.FirstOrDefault(predicate); 
  • level predicate builder (static or instance):

     return this.items.FirstOrDefault(classLevelBuilder(day)); 
  • external predicate and is provided as a parameter of a method

     public TItem GetItem(Func<TItem, bool> predicate) { return this.items.FirstOrDefault(predicate); } 

    while executing this method, I also took two approaches:

    • The predicate is provided directly when the method is called in the for loop:

       for (int i = 0; i < 1000000; i++) { linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay())); } 
    • a predicate builder defined outside a for loop:

       Func<DateTime, Func<Ranger, bool>> builder = d => r => r.IsWithinRange(d); for (int i = 0; i < 1000000; i++) { linqLookup.GetItem(builder(GetRandomDay())); } 

Results - What Works Best

For comparison, when using an iteration class, it takes approx. 770 ms to execute 1 million requests on randomly generated ranges.

  • The 3 local predicate builder is the best optimized compiler, so it performs almost as fast as regular iterations. 800ms .

  • 6.2 predicate builder defined outside the for loop: 885 ms

  • 6.1 predicate defined in a for loop: 1525 ms

  • all the rest took place between 4200 ms - 4360 ms and therefore are considered unsuitable.

Thus, whenever you use a predicate in a method called externally, you can define a constructor and execute it. This will give the best results.

The biggest surprise for me is that delegates (or predicates) can take a lot of time.

+47
performance c # foreach linq
Nov 21 2018-11-11T00:
source share
3 answers

In addition to Gabe's Answer , I can confirm that the difference is apparently due to the cost of reconstructing the delegate for each GetPointData call.

If I add one row to the GetPointData method in your IterationRangeLookupSingle class, then it slows down to the same scan level as LinqRangeLookupSingle . Try:

 // in IterationRangeLookupSingle<TItem, TKey> public TItem GetPointData(DateTime point) { // just a single line, this delegate is never used Func<TItem, bool> dummy = i => i.IsWithinRange(point); // the rest of the method remains exactly the same as before // ... } 

(I'm not sure why the compiler and / or jitter cannot just ignore the extra delegate that I added above. Obviously, the delegate is needed in your LinqRangeLookupSingle class.)

One possible solution is to compose a predicate in LinqRangeLookupSingle , so that point is passed to it as an argument. This means that the delegate must be created only once, and not every time the GetPointData method is GetPointData . For example, the following change will speed up the LINQ version so that it is comparable to the foreach version:

 // in LinqRangeLookupSingle<TItem, TKey> public TItem GetPointData(DateTime point) { Func<DateTime, Func<TItem, bool>> builder = x => y => y.IsWithinRange(x); Func<TItem, bool> predicate = builder(point); return this.items.FirstOrDefault(predicate); } 
+8
Nov 22 '11 at 11:17
source share

Sometimes LINQ appears slower, because generating delegates in a loop (especially a non-obvious loop over method calls) can add time. Instead, you may need to move your crawler from the class to make it more general (for example, your key selector is in build mode):

 public class LinqLookup<TItem, TKey> { private IList<Item> items = null; public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector) { this.items = items.OrderByDescending(keySelector).ToList(); } public TItem GetItem(Func<TItem, TKey> selector) { return this.items.FirstOrDefault(selector); } } 

Since you are not using lambda in your iterative code, that might be a little difference, as it should create a delegate on every pass through the loop. Usually this time is not essential for daily coding, and the time of calling a delegate is no more expensive than other method calls, it is just creating a delegate in a narrow loop, which can add a little extra time.

In this case, since the delegate never changes for the class, you can create it outside of the code that you loop in, and it will be more efficient.

Update

Actually, even without any optimization, compilation in release mode on my machine, I do not see a difference of 5 times. I just completed 1,000,000 searches for an Item that has only a DateTime field that lists 5000 items. Of course, my data, etc. Different, but you can see that the time is really very close when you abstract the delegate:

iterative: 14279 ms, 0.014279 ms / call

linq w opt: 17400 ms, 0.0174 ms / call

These temporary differences are very small and deserve understanding and improved usability of LINQ. However, I do not see a difference of 5 times, which makes me believe that we do not see in your test harness.

+14
Nov 21 '11 at 15:53
source share

Suppose you have a loop like this:

 for (int counter = 0; counter < 1000000; counter++) { // execute this 1M times and time it DateTime day = GetRandomDay(); items.FirstOrDefault(i => i.IsWithinRange(day)); } 

This loop will create 1,000,000 lambda objects to call i.IsWithinRange to access day . After each lambda creation, the delegate that calls i.IsWithinRange is called an average of 1,000,000 * items.Length / 2 times. Both of these factors do not exist in your foreach , so the explicit loop is faster.

+6
Nov 21 '11 at 16:01
source share



All Articles