Linq and Binary Search - Improve This Slow Where Instruction?

I have two collections, each of which contains about 40,000 items.

Elements in list 2 are associated with elements of list 1 through a foreign key.

For each list item one, I want to find the corresponding item in list two.

Something like that:

foreach(var item in list1)
{
  var match = list2.Where(child => child.ID == item.ChildID).FirstOrDefault();
  item.Child = match;
}

It works, but it's slow as hell.

Now both lists1 and list2 are sorted by these keys from the database. Thus, list1 is sorted by ChildID, and list2 is sorted by ID (same value).

I think binary search would speed it up significantly, but I read somewhere that Linq will choose the most appropriate strategy for the list in the Where clause. Maybe I need to explicitly point to a sorted list? Or maybe I need to implement my own binary search algorithm with a comparator?

Any ideas appreciated.

Thanks.

+3
source share
6 answers

Why not use a connection?

var query = 
   from a in list1
   join b in list2 on a.ChildID equals b.ID
   select new {Item1 = a, Item2 = b};

foreach(var item in query)
{
   item.Item1.Child = item.Item2;
}
+11
source

I had this problem before, LINQ search is very slow compared to DB, because it does not use any index.

Do you find using a dictionary instead of a list?

, "" ContainsKey, , .

:

Dictionary<int, Child> list2 = ...;

...

foreach(var item in list1)
{
  if (list2.ContainsKey(item.ChildID))
    item.Child = list2[item.ChildID];
}

, , , .

+1

:

        var joined = list1.Join(list2, x => x.ChildID, x => x.ID, (x, y) => new { x, y });

        foreach (var j in joined)
        {
            j.x.Child = j.y;
        }

?

+1

, :

int index1 = 0, index2 = 0;
while (index1 < list1.Count && index2 < list2.Count) {
   while (index1 < list1.Count && list1[index1].ChildId < list2[index2].Id) index1++;
   if (index1 < list1.Count) {
      while (index2 < list2.Count && list2[index2].Id < list1[index1].ChildId) index2++;
      if (index2 < list2.Count && list1[index1].ChildId == list2[index2].Id) {
         list1[index].Child = list2[index2];
         index1++;
         index2++;
      }
   }
}

int index1 = 0, index2 = 0;
while (index1 < list1.Count && index2 < list2.Count) {
   if (list1[index1].ChildId == list2[index2].Id) {
      list1[index].Child = list2[index2];
      index1++;
      index2++;
   } else {
      if (list1[index1].ChildId < list2[index2].Id) {
         index1++;
      } else {
         index2++;
      }
   }
}

, , :

Dictionary<int, TypeOfChild> index = new Dictionary<int, TypeOfChild>();
foreach (TypeOfChild child in list2) {
   index.Add(child.Id, child);
}
foreach (TypeOfParent parent in list1) {
   TypeOfChild child;
   if (index.TryGetValue(parent.ChildId, out child) {
      parent.Child = child;
   }
}
+1

, : -)

, , , . : , , , .

:

public class Item
{    
   private int _id;
   private List<ItemDetails> _detailItems = new List<ItemDetails>();

   public Item(int id)<br>
   {
       _id = id;
   }

   public void AddItemDetail(ItemDetails itemDetail)
   {
       _detailItems.Add(itemDetail);
   }

   public int Id
   {
       get { return _id; }
   }
   public ReadOnlyCollection<ItemDetails> DetailItems
   {
       get { return _detailItems.AsReadOnly(); }
   }
}


public class ItemDetails
{
    private int _parentId;

    public ItemDetails(int parentId)
    {
        _parentId = parentId;
    }

    public int ParentId
    {
        get { return _parentId; }
    }
}

:

- itemdetail . id . . , .

// for performance tests..
DateTime startDateTime;

// create 2 lists  (master/child list)
List<Item> itemList = new List<Item>();
List<ItemDetails> itemDetailList = new List<ItemDetails>();

Debug.WriteLine("# Adding items");
startDateTime = DateTime.Now;

// add items (sorted)
for (int i = 0; i < 400000; i++)
    itemList.Add(new Item(i));

// show how long it took
Debug.WriteLine("Total milliseconds: " + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") + "ms" );

// adding some random details (also sorted)
Debug.WriteLine("# Adding itemdetails");
Random rnd = new Random(DateTime.Now.Millisecond);

startDateTime = DateTime.Now;

int index = 0;
for (int i = 0; i < 800000; i++)
{
    // when the random number is bigger than 2, index will be increased by 1
    index += rnd.Next(5) > 2 ? 1 : 0;
    itemDetailList.Add(new ItemDetails(index));
}
Debug.WriteLine("Total milliseconds: " + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") + "ms");

// show how many items the lists contains
Debug.WriteLine("ItemList Count: " + itemList.Count());
Debug.WriteLine("ItemDetailList Count: " + itemDetailList.Count());

// matching items
Debug.WriteLine("# Matching items");
startDateTime = DateTime.Now;

int itemIndex = 0;
int itemDetailIndex = 0;

int itemMaxIndex = itemList.Count;
int itemDetailMaxIndex = itemDetailList.Count;

// while we didn't reach any end of the lists, continue...
while ((itemIndex < itemMaxIndex) && (itemDetailIndex < itemDetailMaxIndex))
{
    // if the detail.parentid matches the item.id. add it to the list.
    if (itemList[itemIndex].Id == itemDetailList[itemDetailIndex].ParentId)
    {
        itemList[itemIndex].AddItemDetail(itemDetailList[itemDetailIndex]);
        // increase the detail index.
        itemDetailIndex++;
    }
    else
        // the detail.parentid didn't matches the item.id so check the next 1
        itemIndex++;
}

Debug.WriteLine("Total milliseconds: " + (DateTime.Now - startDateTime).TotalMilliseconds.ToString("0") + "ms");

10 , :

:
: 140ms
itemdetails:
: 203ms
- : 400000
ItemDetailList: 800000
:
: 265

. , . .

Greetz, .

+1

, , FirstOrDefault() .

item.Child = list2.FirstOrDefault(child => child.ID == item.ChildID)

, , Where().

Here, however, the question is, is the method actually slow or slow only the first time it starts after compilation? Check out the discussion in this post .

0
source

Source: https://habr.com/ru/post/1716017/


All Articles