Optimization of Join, GroupBy, Distinct, Min / Max on naturally sorted sequences

LINQ statements work under the assumption that input sequences are not sorted, which is great for typical cases. However, the above operators can be more efficient if the source sequences are sorted by key values.

For example, Join reads the entire inner sequence into a hash table and only then iterates over the outer sequence. If two sequences have been sorted, the join can be implemented as a simple merge without additional storage and searching for hash tables.

Is there a library that has alternative high-performance LINQ functions working in pre-sorted sequences?

+3
source share
4 answers

I am developing Nito.LINQ as I have time. It provides ISortedEnumerableand ISortedListsome of you suggested optimizations. I also include more controversial optimizations (e.g. Skipfor IList, which slightly changes the semantics).

0
source

Yes, but not for LINQ to Objects. Most LINQ providers that work with IQueryable<T>are already translating to a native function that can easily use this type of optimization. For example, when working with the Entity Framework, the EF provider will turn this into an SQL call, and DB (hopefully) optimizes this correctly.

LINQ to Objects . ( ) IEqualityComparer<T> IComparer<T>. , "" , .

, LINQ . , , , , ( , Count(), ICollection).

0

, , , , . (, IOrderedEnumerable<T>), LINQ.

, , , , , . IEnumerable<T> , .

An example would be OrderedJoinwhere you need to provide IComparable<TKey>to find out which squat will move forward if the current item in each sequence does not have a match on the key. Here's the signature as a starter. You can tell us when you have implemented your entire library!

public static IEnumerable<TResult> OrderedJoin<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter, TKey> outerKeySelector,
    Func<TInner, TKey> innerKeySelector,
    Func<TOuter, TInner, TResult> resultSelector,
    IComparable<TKey> comparer
)
0
source
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;

namespace OrderedJoin
{
    public static class EnumerableExtension
    {
        private enum JoinType
        {
            Inner,
            Left,
            Right,
            Full
        }

        private static IEnumerable<TResult> OrderedJoinIterator<T, TResult>(
            this IEnumerable<T> left, IEnumerable<T> right, Func<T, T, TResult> resultSelector, JoinType jt, IComparer<T> comparer)
        {
            if (left == null) throw new ArgumentNullException("left");
            if (right == null) throw new ArgumentNullException("right");
            if (resultSelector == null) throw new ArgumentNullException("resultSelector");

            if (comparer == null)
                comparer = Comparer<T>.Default;

            var l = left.GetEnumerator();
            var r = right.GetEnumerator();

            var lHasData = l.MoveNext();
            var rHasData = r.MoveNext();

            while (lHasData || rHasData)
            {
                if (!lHasData && rHasData)
                {
                    if (jt == JoinType.Inner || jt == JoinType.Left)
                        yield break;
                    yield return resultSelector(default(T), r.Current);
                    rHasData = r.MoveNext();
                    continue;
                }
                if (!rHasData && lHasData)
                {
                    if (jt == JoinType.Inner || jt == JoinType.Right)
                        yield break;
                    yield return resultSelector(l.Current, default(T));
                    lHasData = l.MoveNext();
                    continue;
                }

                var comp = comparer.Compare(l.Current, r.Current);

                if (comp < 0)
                {
                    if (jt == JoinType.Left || jt == JoinType.Full)
                        yield return resultSelector(l.Current, default(T));
                    lHasData = l.MoveNext();
                }
                else if (comp > 0)
                {
                    if (jt == JoinType.Right || jt == JoinType.Full)
                        yield return resultSelector(default(T), r.Current);
                    rHasData = r.MoveNext();
                }
                else
                {
                    yield return resultSelector(l.Current, r.Current);
                    lHasData = l.MoveNext();
                    rHasData = r.MoveNext();
                }
            }
        }

        public static IEnumerable<TResult> OrderedInnerJoin<T, TResult>(
            this IEnumerable<T> left, IEnumerable<T> right, Func<T, T, TResult> resultSelector, IComparer<T> comparer = null)
        {
            return OrderedJoinIterator(left, right, resultSelector, JoinType.Inner, comparer);
        }

        public static IEnumerable<TResult> OrderedFullJoin<T, TResult>(
            this IEnumerable<T> left, IEnumerable<T> right, Func<T, T, TResult> resultSelector, IComparer<T> comparer = null)
        {
            return OrderedJoinIterator(left, right, resultSelector, JoinType.Full, comparer);
        }

        public static IEnumerable<TResult> OrderedLeftJoin<T, TResult>(
            this IEnumerable<T> left, IEnumerable<T> right, Func<T, T, TResult> resultSelector, IComparer<T> comparer = null)
        {
            return OrderedJoinIterator(left, right, resultSelector, JoinType.Left, comparer);
        }

        public static IEnumerable<TResult> OrderedRightJoin<T, TResult>(
            this IEnumerable<T> left, IEnumerable<T> right, Func<T, T, TResult> resultSelector, IComparer<T> comparer = null)
        {
            return OrderedJoinIterator(left, right, resultSelector, JoinType.Right, comparer);
        }
    }

    internal class TestEnum : IEnumerable<int>
    {
        public TestEnum(string name, IList<int> nums)
        {
            Name = name;
            Nums = nums;
        }

        public string Name { get; private set; }
        public IList<int> Nums { get; private set; }

        public IEnumerator<int> GetEnumerator()
        {
            foreach (var item in Nums)
            {
                Console.WriteLine("{0}: {1}", Name, item);
                yield return item;
            }
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var e1 = new TestEnum("L", new List<int> { 1, 2, 5, 6 });
            var e2 = new TestEnum("R", new List<int> { 1, 3, 4, 6 });

            var print = new Action<IEnumerable<string>>(seq => { foreach (var item in seq) Console.WriteLine("\t" + item); });

            Console.WriteLine("Standard Inner Join:");
            print(e1.Join(e2, i => i, j => j, (i, j) => string.Format("{0} <=> {1}", i, j), EqualityComparer<int>.Default));

            Console.WriteLine("Ordered Inner Join:");
            print(e1.OrderedInnerJoin(e2, (i, j) => string.Format("{0} <=> {1}", i, j)));

            Console.WriteLine("Ordered Full Join:");
            print(e1.OrderedFullJoin(e2, (i, j) => string.Format("{0} <=> {1}", i, j)));

            Console.WriteLine("Ordered Left Join:");
            print(e1.OrderedLeftJoin(e2, (i, j) => string.Format("{0} <=> {1}", i, j)));

            Console.WriteLine("Ordered Right Join:");
            print(e1.OrderedRightJoin(e2, (i, j) => string.Format("{0} <=> {1}", i, j)));

            Console.ReadLine();
        }
    }
}
0
source

Source: https://habr.com/ru/post/1790582/


All Articles