Optimization of OrderBy () when using Any ()

So, I have a pretty standard LINQ-to-Object installation.

var query = expensiveSrc.Where(x=> x.HasFoo) .OrderBy(y => y.Bar.Count()) .Select(z => z.FrobberName); // ... if (!condition && !query.Any()) return; // seems to enumerate and sort entire enumerable // ... foreach (var item in query) // ... 

It lists all two times. What is wrong.

 var queryFiltered = expensiveSrc.Where(x=> x.HasFoo); var query = queryFiltered.OrderBy(y => y.Bar.Count()) .Select(z => z.FrobberName); if (!condition && !queryFiltered.Any()) return; // ... foreach (var item in query) // ... 

It works, but is there a better way?

Will there be any crazy way to enlighten Any () to get around maintenance-free operations? I think I remember that such an optimization goes to EduLinq.

+6
source share
7 answers

There is little information that can be extracted from the enumerated, so it might be better to include the query in IQueryable? This Any extension method skips its expression tree, skipping all irrelevant operations, then turns the important branch into a delegate that you can call to get optimized IQueryable. The standard Any method is applied to it explicitly to avoid recursion. Not sure about corner cases, and it might make sense to cache compiled queries, but with simple queries like yours, it seems to work.

 static class QueryableHelper { public static bool Any<T>(this IQueryable<T> source) { var e = source.Expression; while (e is MethodCallExpression) { var mce = e as MethodCallExpression; switch (mce.Method.Name) { case "Select": case "OrderBy": case "ThenBy": break; default: goto dun; } e = mce.Arguments.First(); } dun: var d = Expression.Lambda<Func<IQueryable<T>>>(e).Compile(); return Queryable.Any(d()); } } 

The queries themselves should be modified as follows:

 var query = expensiveSrc.AsQueryable() .Where(x=> x.HasFoo) .OrderBy(y => y.Bar.Count()) .Select(z => z.FrobberName); 
+2
source

Why not just get rid of the excess:

 if (!query.Any()) return; 

This does not seem to work, even without it, the foreach body will not be executed if the query fails. Thus, when checking Any() you do not save anything in the fast path and list the slow path twice.

On the other hand, if you need to know if any results were found after the loop ended, you can simply use the flag:

 bool itemFound = false; foreach (var item in query) { itemFound = true; ... // Rest of the loop body goes here. } if(itemFound) { // ... } 

Or you can use the enumerator directly if you really care about over setting the flag in the body of the loop:

 using(var erator = query.GetEnumerator()) { bool itemFound = erator.MoveNext(); if(itemFound) { do { // Do something with erator.Current; } while(erator.MoveNext()) } // Do something with itemFound } 
+9
source

Change (revised):. This answer causes a problem with a query being executed twice, which in my opinion is a key problem. The following shows why:

Making Any() smarter is something that only Linq, IMO performers can do ... Or it would be a dirty adventure using reflection.

Using the class as shown below, you can cache the output of the original enumeration and enumerate it twice:

 public class CachedEnumerable<T> { public CachedEnumerable(IEnumerable<T> enumerable) { _source = enumerable.GetEnumerator(); } public IEnumerable<T> Enumerate() { int itemIndex = 0; while (true) { if (itemIndex < _cache.Count) { yield return _cache[itemIndex]; itemIndex++; continue; } if (!_source.MoveNext()) yield break; var current = _source.Current; _cache.Add(current); yield return current; itemIndex++; } } private List<T> _cache = new List<T>(); private IEnumerator<T> _source; } 

This way you keep the lazy aspect of LINQ, keep it readable and universal. This will be slower than directly using IEnumerator<> . There are many possibilities for expanding and optimizing this class, such as the policy of abandoning old elements, getting rid of coroutines, etc. But this is not the case, I think.

Oh, and the class is not thread safe as it is now. This was not asked, but I can imagine how people are trying. I think that this could easily be added if the source enumerated does not have downstream proximity.

Why is this optimal?

Consider two options: an enumeration may contain elements or not.

  • If it contains elements, this approach is optimal, since the request is run only once.
  • If there are no elements in it, you will be tempted to exclude OrderBy and select a part of your queries, because they add no matter. But .. if after the Where() clause there are null elements, for sorting the null elements will cost zero time (well, almost). The same goes for the Select() clause.

What if it is not fast enough yet? In this case, my strategy would be to bypass Linq. Now, I really love linq, but this elegance comes at a price. Therefore, for every 100 times you use Linq, there will usually be one or two calculations that are important to perform very quickly, which I write with the good old for loops and lists. Part of mastering technology is recognizing where it does not fit. Linq is no exception to this rule.

+2
source

Will there be any crazy way to enlighten Any () to get around maintenance-free operations? I think I remember that such an optimization goes to EduLinq.

Well, I'm not going to ignore any question that Edulin mentions :)

In this case, Edulinq can be faster than LINQ for objects, because its OrderBy implementation is as lazy as it can be - it only sorts as much as it takes to get the returned elements.

However, in principle, he still needs to read the entire sequence before it returns anything. In the end, the last element in the sequence may be the first to be returned.

If you control the entire stack, you can make Any() detect that it is called in your "known" IOrderedEnumerable implementation, and go straight to the source. Note that this does lead to a change in the observed behavior, although if the iteration throughout the sequence throws an exception (or has some other side effect), then this side effect will be lost during optimization. You could argue that everything is fine, of course, what is considered the β€œright” optimization in LINQ is a rather complicated area.

Another possibility, which is pretty terrible, but which would solve this particular problem, would be to force the iterator returned from IOrderedEnumerable to simply take the first MoveNext() value from the source. This is enough for the normal implementation of Any , and at this point we do not need to know what the first element is. We can defer the actual sort until the Current property is used.

This is a rather special optimization, although I would be careful in implementation. I think Ani's approach is the best - just use the fact that repetition with query using foreach will never go into the loop body if the query results are empty.

+2
source

Try the following:

 var items = expensiveSrc.Where(x=> x.HasFoo) .OrderBy(y => y.Bar.Count()) .Select(z => z.FrobberName).ToList(); // ... if (!condition && items.Count == 0) return; // Just check the count // ... foreach (var item in items) // ... 

The request is executed only once.

+1
source

but i have lost streaming / lazy loading that half linq point

Lazy loading (delayed execution) and 2 LINQ queries with disparate results cannot be optimized (reduced) to 1 query execution.

+1
source

why don't you use .ToArray ()

 var query = expensiveSrc.Where(x=> x.HasFoo) .OrderBy(y => y.Bar.Count()) .Select(z => z.FrobberName).ToArray(); 

if there are no elements, sorting and selection should not give a large overhead. if you sort, then you still need a cache where to store the data, so .ToArray does not produce so many invoices. if you decompile the OrderedEnumerable class, you will find that the int [] array containing the links is formed, so you simply create a new array of links using .ToArray (or .ToList).

BUT if the roadSrc comes from a database, other strategies might be better. if the order can be executed in the database, this will give you a lot of overhead, because the data is stored twice.

0
source

Source: https://habr.com/ru/post/906716/


All Articles