Retrieving a large number of records under several constraints without causing an exception in memory

I have the following situation:

  • There are two related types. For this question, I will use the following simple types:

    public class Person { public Guid Id {get; set;} public int status {get; set;} } public class Account { public Guid AccountId {get; set;} public decimal Amount { get; set; } public Guid PersonId { get; set; } } 

    So, for one Person can be several Account (i.e. several Account will refer to the same PersonId ).

  • There are tens of thousands of people in our database, and each of them has an average of 5-10 accounts.

  • I need to get each user accounts if they meet certain requirements. Subsequently, I need to find out if all these persons correspond to other conditions.

    In this example, let's say I need every account with amount < 100 , and after getting the accounts of one person, I need to check if their total is more than 1000.

  • Using a LINQ query is desirable, but cannot be done using group-by-into keywords, because Linq-Provider (LINQ-to-CRM) does not support it.

  • In addition, executing the following simple LINQ query to implement the requirements of Listing 3 is also not possible (please read the inline comment):

     var query = from p in personList join a in accountList on p.Id equals a.PersonId where a.Amount < 100 select a; var groups = query.GroupBy(a => a.PersonId); // and now, run in bulks on x groups // (let x be the groups amount that won't cause an out-of-memory exception) 

    This is not possible for two reasons:

    a. The Linq-Provider forces a call to ToList() before using GroupBy() .

    b. Trying to actually call ToList() before using GroupBy() results in an out-of-memory exception - since there are tens of thousands of accounts.

  • For efficiency reasons, I don’t want to do the following, as this means tens of thousands of queries:

    a. Get all the people out.

    b. Go through them and retrieve the accounts of each user at each iteration.

I will be glad for effective ideas.

+5
source share
2 answers

I would suggest ordering the PersonId query by switching to LINQ on objects via AsEnumerable() (thus, executing it, but without materializing the entire result set in memory, like calling ToList() ), and then use GroupAdjacent from the MoreLINQ package:

This method is implemented using deferred execution and grouping threads. Grouping elements, however, are buffered. Therefore, each grouping will be received as soon as it is completed, and until the next grouping.

 var query = from p in personList join a in accountList on p.Id equals a.PersonId where a.Amount < 100 orderby a.PersonId select a; var groups = query.AsEnumerable() .GroupAdjacent(a => a.PersonId) .Where(g => g.Sum(a => a.Amount) > 1000); 

The AsEnumerable() three works great with the EF query provider. Regardless of whether it works with the LINQ to CRM provider, it really depends on how the provider implements the GetEnumerator() method - if it still tries to unload the entire query result, then you're out of luck.

+5
source

I would order a GUID and then process in blocks:

 var basep = (from p in personList select p.Id).OrderBy(id => id); int basepCount = personList.Count(); int blocksize = 1000; int numblocks = (basepCount / blocksize) + (basepCount % blocksize == 0 ? 0 : 1); for (var block = 0; block < numblocks; ++block) { var firstPersonId = basep.Skip(block * blocksize).First(); var lastPersonId = basep.Skip(Math.Min(basepCount-1, block*blocksize+blocksize-1)).First(); var query = from p in personList.Where(ps => firstPersonId.CompareTo(ps.Id) <= 0 && ps.Id.CompareTo(lastPersonId) <= 0) join a in accountList on p.Id equals a.PersonId where a.Amount < 100 select a; var groups = query.GroupBy(a => a.PersonId); // work on groups } 
+1
source

Source: https://habr.com/ru/post/1269452/


All Articles