Linq IEnumerable Extension Method - How to increase productivity?

I wrote the following extension method, which looks for a sequence of consecutive elements that satisfy the predicate passed to it. The number of consecutive elements in a sequence is determined by the "sequenceSize" parameter.

As an example, I might have an IEnumerable of integers, and I want to find 10 consecutive values ​​that are greater than 100. This extension method will determine if such a sequence exists.

This method works well. But because of what it should do, it can be slow if there are a significant number of elements in IEnumerable, because it must start from the first element, look for consecutive values ​​that satisfy the predicate, and then go to the second element and do the same etc.

I am looking for suggestions on how to speed this up. I tried to use AsParallel (), but that did not affect.

public static IEnumerable<IEnumerable<T>> FindSequenceConsecutive<T>(this IEnumerable<T> sequence, Predicate<T> predicate, int sequenceSize) { IEnumerable<T> current = sequence; while (current.Count() > sequenceSize) { IEnumerable<T> window = current.Take(sequenceSize); if (window.Where(x => predicate(x)).Count() >= sequenceSize) yield return window; current = current.Skip(1); } } 
+4
source share
3 answers

I believe that this solution will provide better performance and will be better scaled as the sequence increases, since it does not allocate any additional buffers (lists or queues) and should not convert the result to a list or do anything that counts on the result buffer. In addition, it only iterates over the sequence.

 public static IEnumerable<IEnumerable<T>> FindSequenceConsecutive<T>(this IEnumerable<T> sequence, Predicate<T> predicate, int sequenceSize) { IEnumerable<T> window = Enumerable.Repeat(default(T), 0); int count = 0; foreach (var item in sequence) { if (predicate(item)) { window = window.Concat(Enumerable.Repeat(item, 1)); count++; if (count == sequenceSize) { yield return window; window = window.Skip(1); count--; } } else { count = 0; window = Enumerable.Repeat(default(T), 0); } } } 
+3
source

The most likely reason for this method being slow is the .Count() call, which immediately lists the sequence to determine the number of elements.

You will probably be better off explicitly checking the criteria and tracking the counts, rather than using Where() and Count() several times.

In general, this method lists the sequence a lot. You can experience good speed if you call .ToList() to enumerate the sequence once, and then do your operations in the generated list. (Note that this will not work if you plan to use this method for sequences of infinite length.)

Update . You are testing >= sequenceSize , although window.Count() == sequenceSize . In other words, you just need All() :

 if (window.All(x => predicate(x))) yield return window; 

Not sure how much this will help, but at least semantically understandable.

Further editing . Consider this method:

 public static IEnumerable<IEnumerable<T>> FindSequenceConsecutive<T>(this IEnumerable<T> sequence, Predicate<T> predicate, int sequenceSize) { List<T> list = sequence.ToList(); List<bool> matchList = list.Select(x => predicate(x)).ToList(); int start = 0; int count = list.Count; while (start + sequenceSize <= count) { var range = matchList.GetRange(start, sequenceSize); if (range.All(x => x)) yield return list.GetRange(start, sequenceSize); start++; } } 

It evaluates the sequence once, and then breaks up the list of necessary ones.

+5
source

I think something like this might work for you, since you can go through the list once and basically maintain a queue of consecutive elements that pass the predicate, clear (all), and de-emulate (one) as needed.

 public static IEnumerable<IEnumerable<T>> FindSequenceConsecutive<T>(this IEnumerable<T> sequence, Predicate<T> predicate, int sequenceSize) { var queue = new Queue<T>(); foreach (T item in sequence) { if (predicate(item)) { queue.Enqueue(item); if (queue.Count == sequenceSize) { yield return queue.ToList(); queue.Dequeue(); } } else { queue.Clear(); } } } 

So I write

 int[] array = { 1, 2, 3, 4, 5, 2, 8, 3, 5, 6 }; foreach (var seq in array.FindSequenceConsecutive(i => i > 2, 3)) { Console.WriteLine(string.Join(",", seq)); } 

Profitability

 3,4,5 8,3,5 3,5,6 
+4
source

Source: https://habr.com/ru/post/1369033/


All Articles