Improve the time complexity of current Linq queries

I have the following lists:

RakeSnapshots, ProductMovements

The goal is to process both and get the number of elements that match the condition, as follows:

  • Consider RakeSnapshotswithStatusCode == "Dumping"

  • Consider ProductMovementwithStatus == "InProgress"

  • Retrieve countall elements of both lists that satisfy the condition RakeSnapshots.RakeCodeequal toProductMovements.ProductCode

Below are my current options:

// Code 1:

 var resultCount =  ProductMovements.Where(x => RakeSnapshots
                                                .Where(r => r.StatusCode == "Dumping")
                                                .Any(y => y.RakeCode == x.ProductCode  && 
                                                          x.Status == "InProgress"))
                                                .Count();

// Code 2:

var productMovementsInprogress = ProductMovements.Where(x => x.Status == "InProgress");

var rakeSnapShotsDumping = RakeSnapshots.Where(r => r.StatusCode == "Dumping");

var resultCount = productMovementsInprogress.Zip(rakeSnapShotsDumping,(x,y) => (y.RakeCode == x.ProductCode) ?  true : false)
                                            .Where(x => x).Count();

A call is like code O(n^2)complexity, is there any way to improve it, it will hurt if the data is very large

+4
source share
3 answers

You can use inner jointo do this:

var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

var matches =
    from r in dumpingRakeSnapshots
    join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
    select r;

int count = matches.Count(); // Here the answer.

, ( ) , RakeCode RakeSnapshots.

, grouped join.

Linq, , , , Ivan ( Linq):

var matches =
    from r in dumpingRakeSnapshots
    join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
    select gj;

, , , RakeCode ProductCode :

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp1
{
    class RakeSnapshot
    {
        public string StatusCode;
        public string RakeCode;
    }

    class ProductMovement
    {
        public string Status;
        public string ProductCode;
    }

    sealed class Program
    {
        void run()
        {
            var rakeSnapshots = new List<RakeSnapshot>
            {
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "1"},
                new RakeSnapshot {StatusCode = "Dumping", RakeCode = "2"}
            };

            var productMovements = new List<ProductMovement>
            {
                new ProductMovement {Status = "InProgress", ProductCode = "1"},
                new ProductMovement {Status = "InProgress", ProductCode = "2"},
                new ProductMovement {Status = "InProgress", ProductCode = "2"}
            };

            var dumpingRakeSnapshots       = rakeSnapshots.Where(r => r.StatusCode == "Dumping");
            var inProgressProductMovements = productMovements.Where(p => p.Status == "InProgress");

            // Inner join.

            var matches1 =
                from r in dumpingRakeSnapshots
                join p in inProgressProductMovements on r.RakeCode equals p.ProductCode
                select r;

            Console.WriteLine(matches1.Count());

            // Grouped join.

            var matches2 =
                from r in dumpingRakeSnapshots
                join p in inProgressProductMovements on r.RakeCode equals p.ProductCode into gj
                select gj;

            Console.WriteLine(matches2.Count());

            // OP code.

            var resultCount = 
                productMovements
                .Count(x => rakeSnapshots
                .Where(r => r.StatusCode == "Dumping")
                .Any(y => y.RakeCode == x.ProductCode && x.Status == "InProgress"));

            Console.WriteLine(resultCount);
        }

        static void Main(string[] args)
        {
            new Program().run();
        }
    }
}
+3

Group Join, ( Join) LINQ :

var resultCount = ProductMovements.Where(p => p.Status == "InProgress")
    .GroupJoin(RakeSnapshots.Where(r => r.StatusCode == "Dumping"), 
        p => p.ProductCode, r => r.RakeCode, (p, match) => match)
    .Count(match => match.Any());

- O (N + M).

+3

, O (N ^ 2), , . - - O (1) O (log N).

, O (P.R), P - , R - .

;

var resultCount =  ProductMovements
    .Where(x => RakeSnapshots
        .Where(r => r.StatusCode == "Dumping")
        .Any(y => y.RakeCode == x.ProductCode  && 
                  x.Status == "InProgress"))
        .Count();

O (PR), P where R. Dictionary<T> HashSet<T>,

var rakeSnapshotSummary = ... magic happens here ...;
var resultCount =  ProductMovements
    .Where(x => rakeSnapshotSummary[x.ProductCode] == true)
    .Count();

, - O (R), - O (1), O (P) O (P + R). , .

, - :

var rakeSnapshotSummary = new HashSet<string>(RakeSnapshots
    .Where(r => r.StatusCode == "Dumping")
    .Select(r => r.RakeCode));

This creates HashSet<string>one that will have O (1) time complexity to verify the existence of the rake code. Then your final line looks like

var resultCount =  ProductMovements
    .Where(x => x.Status == "InProgress" && rakeSnapshotSummary.Contains(x.ProductCode))
    .Count();

In general, O (P + R) or, roughly speaking, O (2N) => O (N).

+1
source

Source: https://habr.com/ru/post/1652293/


All Articles