Quicksort with Linq Advantage performance passing through T [] compared to IEnumerable <T>
Just for fun, I built a quicksort implementation in C # with Linq:
public static IEnumerable<T> quicksort<T>(IEnumerable<T> input) where T : IComparable<T>{ if (input.Count() <= 1) return input; var pivot = input.FirstOrDefault(); var lesser = quicksort(input.Skip(1).Where(i => i.CompareTo(pivot) <= 0)); var greater = quicksort(input.Where(i => i.CompareTo(pivot) > 0)); return lesser.Append(pivot).Concat(greater); } Sorts 10,000 random numbers in 13 seconds.
Changing it to use int [] instead of List gives about 700 times more performance! Sorting the same 10,000 random numbers requires only 21 ms.
public static T[] quicksortArray<T>(T[] input) where T : IComparable<T>{ if (input.Count() <= 1) return input; var pivot = input.FirstOrDefault(); var lesser = quicksortArray(input.Skip(1).Where(i => i.CompareTo(pivot) <= 0).ToArray()); var greater = quicksortArray(input.Where(i => i.CompareTo(pivot) > 0).ToArray()); return lesser.Append(pivot).Concat(greater).ToArray(); } Just by looking at the code, I would suggest that this would be a worse view. I assumed that .ToArray () would create an additional array in memory and copy all integers there. I think that passing an array compared to passing a list should take about the same time.
So where does this huge performance difference come from?
This is why you have to be very careful in repeating IEnumerable several times.
The first time you call quicksort you go through, say, List . It calls quicksort two more times, in each of these cases the IEnumerable you pass in is a query that will skip the first element and then perform a comparison on each individual element after that. Then you take this query and pass it to two more instances of quicksort , but at the same time, the query not only skips the first element and compares each element after it, but then skips the first element of the results of this query, and then compares each thing after that something. This means that by the time you finally reach the value, you will get a query that represents log (n), skips and compares each element in the sequence with the value of log (n) times.
In your second implementation, you do not pass quicksort requests, you evaluate these requests to their values ββand pass the values ββto the operation, which then can use these values ββtwice, and not constantly execute an extremely complex request several times.
Regarding linq queries, they are lazy, they will not be evaluated until you name a method like ToArray or ToList . For example, in your first code, this query:
input.Skip(1).Where(i => i.CompareTo(pivot)) Will be evaluated at least twice each time you call quicksort , once for Count() and once for FirstOrDefault . This means that the filtering operation will be repeated for each call again and again. When you use ToArray , on the other hand, since you already have filtered elements in the array, Where will not be executed every time, it will be executed after calling ToArray and it. This is the difference between the codes on the basis of which you can do the math for other parts.