I have two classes that fetch date range data on specific days.
public class IterationLookup<TItem> { private IList<Item> items = null; public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector) { this.items = items.OrderByDescending(keySelector).ToList(); } public TItem GetItem(DateTime day) { foreach(TItem i in this.items) { if (i.IsWithinRange(day)) { return i; } } return null; } } public class LinqLookup<TItem> { private IList<Item> items = null; public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector) { this.items = items.OrderByDescending(keySelector).ToList(); } public TItem GetItem(DateTime day) { return this.items.FirstOrDefault(i => i.IsWithinRange(day)); } }
Then I do speed tests that show that the Linq version is 5 times slower . This makes sense if I store items locally without listing them using ToList . This would make it much slower, because with every call to FirstOrDefault linq would also execute OrderByDescending . But this is not so, so I do not know what is happening. Linq should look very much like an iteration.
This is a piece of code that measures my timings
IList<RangeItem> ranges = GenerateRanges(); // returns List<T> var iterLookup = new IterationLookup<RangeItems>(ranges, r => r.Id); var linqLookup = new LinqLookup<RangeItems>(ranges, r => r.Id); Stopwatch timer = new Stopwatch(); timer.Start(); for(int i = 0; i < 1000000; i++) { iterLookup.GetItem(GetRandomDay()); } timer.Stop(); // display elapsed time timer.Restart(); for(int i = 0; i < 1000000; i++) { linqLookup.GetItem(GetRandomDay()); } timer.Stop(); // display elapsed time
Why do I know that it should work better? Because when I write very similar code without using these search classes, Linq is very similar to foreach iteration ...
// continue from previous code block // items used by both order as they do in classes as well IList<RangeItem> items = ranges.OrderByDescending(r => r.Id).ToList(); timer.Restart(); for(int i = 0; i < 1000000; i++) { DateTime day = GetRandomDay(); foreach(RangeItem r in items) { if (r.IsWithinRange(day)) { // RangeItem result = r; break; } } } timer.Stop(); // display elapsed time timer.Restart(); for(int i = 0; i < 1000000; i++) { DateTime day = GetRandomDay(); items.FirstOrDefault(i => i.IsWithinRange(day)); } timer.Stop(); // display elapsed time
This, in my opinion, is very similar code. FirstOrDefault , as far as I know, iteration continues only until it reaches a valid element or until it reaches the end. And it's somehow the same as foreach with break .
But even the iteration class is worse than my simple foreach iteration loop, which is also a mystery, because all the overhead is a method call inside the class compared to direct access.
Question
What am I doing wrong in my LINQ class, which it runs very slowly?
What am I doing wrong in my Iteration class, so it performs twice as slow as a direct foreach ?
What time is measured?
I am doing the following steps:
- Creating ranges (as shown in the results below)
- Create object instances for IterationLookup, LinqLookup (and my optimized BitCountLookup date range class, which is not part of the discussion here)
- Start the timer and execute 1 million requests on random days within the maximum date range (as can be seen from the results) using the previously created IterationLookup class.
- Start the timer and execute 1 million queries on random days within the maximum date range (as can be seen from the results) using the previously created LinqLookup class.
- Start the timer and execute 1 million requests (6 times) using manual foreach + break loops and Linq calls.
As you can see, the object is not measured .
Appendix I: Results of over a million searches
The ranges displayed in these results do not overlap, which should make both approaches even more similar if the LINQ version does not break the loop upon a successful match (which most likely does).
Generated Ranges:
ID Range 00000000011111111112222222222330000000001111111111222222222222
123456789012345678901234567890112345678901234567890123456789
09.01.01. -30.01. | ------- |
08.01.01. -16.01. | - |
07.16.02.-19.02. | - |
06.01.15. -17.01. | - |
05 19.02.-23.02. | --- |
04.01.01.-07.01. | ----- |
03 02.01.-10.01. | ------- |
02 11.01.-13.01. | - |
01 16.01.-20.01. | --- |
00 29.01.-06.02. | ------- |
Lookup classes ...
- Iteration: 1028ms
- Linq: 4517ms !!! THIS IS THE PROBLEM !!!
- BitCounter: 401ms
Manual loops ...
- Iter: 786ms
- Linq: 981ms
- Iter: 787ms
- Linq: 996ms
- Iter: 787ms
- Linq: 977ms
- Iter: 783ms
- Linq: 979ms
Appendix II: GitHub: Gist code for checking for yourself
I set Gist so that you can get the full code yourself and see what happens. Create a Console application and copy Program.cs into it, add other files that are part of this principle.
Grab it here .
Appendix III: Final Thoughts and Measurement Tests
The most problematic was, of course, LINQ implementatino, which was terribly slow. It turns out this is due to optimizing the delegate compiler. LukeH provided the best and most usable solution that actually made me try different approaches to this. I tried various approaches in the GetItem method (or GetPointData , as it was called in Gist):
the usual way that most developers will do (and implemented in Gist, and also did not update after the results showed that this is not the best way to do this):
return this.items.FirstOrDefault(item => item.IsWithinRange(day));
defining a local predicate variable:
Func<TItem, bool> predicate = item => item.IsWithinRange(day); return this.items.FirstOrDefault(predicate);
local predicate builder:
Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d); return this.items.FirstOrDefault(builder(day));
local constructor predictor and local predicate variable:
Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d); Func<TItem, bool> predicate = builder(day); return this.items.FirstOrDefault(predicate);
level predicate builder (static or instance):
return this.items.FirstOrDefault(classLevelBuilder(day));
external predicate and is provided as a parameter of a method
public TItem GetItem(Func<TItem, bool> predicate) { return this.items.FirstOrDefault(predicate); }
while executing this method, I also took two approaches:
The predicate is provided directly when the method is called in the for loop:
for (int i = 0; i < 1000000; i++) { linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay())); }
a predicate builder defined outside a for loop:
Func<DateTime, Func<Ranger, bool>> builder = d => r => r.IsWithinRange(d); for (int i = 0; i < 1000000; i++) { linqLookup.GetItem(builder(GetRandomDay())); }
Results - What Works Best
For comparison, when using an iteration class, it takes approx. 770 ms to execute 1 million requests on randomly generated ranges.
The 3 local predicate builder is the best optimized compiler, so it performs almost as fast as regular iterations. 800ms .
6.2 predicate builder defined outside the for loop: 885 ms
6.1 predicate defined in a for loop: 1525 ms
- all the rest took place between 4200 ms - 4360 ms and therefore are considered unsuitable.
Thus, whenever you use a predicate in a method called externally, you can define a constructor and execute it. This will give the best results.
The biggest surprise for me is that delegates (or predicates) can take a lot of time.