Memory grows unexpectedly when using entity structures for volume insertion

I need to process as 1M objects to create facts. There should be about the same total facts (1 million).

The first problem I ran into was the mass insert, which was slow with the entity infrastructure. Therefore, I used this template. The fastest way to insert into the Entity Framework (answer from SLauma). And I can embed entities into reality quickly around 100K in one minute.

Another issue that I ran into is a lack of memory to handle everything. Therefore, I "laid out" the processing. To avoid an exception from memory, I get if I make a list of my 1 million summary facts.

The problem is that memory always grows even with paging, and I donโ€™t understand why. After each batch, the memory is not freed. I think this is strange, because I get answers to the collection of facts and storing them in the database at each iteration of the cycle. Once the loop is complete, they should be freed from memory. But this does not look like because after each iteration the memory is not freed.

Could you tell me if you see something wrong before I dig anymore? More specifically, why memory is not freed after an iteration of the while loop.

static void Main(string[] args) { ReceiptsItemCodeAnalysisContext db = new ReceiptsItemCodeAnalysisContext(); var recon = db.Recons .Where(r => r.Transacs.Where(t => t.ItemCodeDetails.Count > 0).Count() > 0) .OrderBy( r => r.ReconNum); // used for "paging" the processing var processed = 0; var total = recon.Count(); var batchSize = 1000; //100000; var batch = 1; var skip = 0; var doBatch = true; while (doBatch) { // list to store facts processed during the batch List<ReconFact> facts = new List<ReconFact>(); // get the Recon items to process in this batch put them in a list List<Recon> toProcess = recon.Skip(skip).Take(batchSize) .Include(r => r.Transacs.Select(t => t.ItemCodeDetails)) .ToList(); // to process real fast Parallel.ForEach(toProcess, r => { // processing a recon and adding the facts to the list var thisReconFacts = ReconFactGenerator.Generate(r); thisReconFacts.ForEach(f => facts.Add(f)); Console.WriteLine(processed += 1); }); // saving the facts using pattern provided by Slauma using (TransactionScope scope = new TransactionScope(TransactionScopeOption.Required, new System.TimeSpan(0, 15, 0))) { ReceiptsItemCodeAnalysisContext context = null; try { context = new ReceiptsItemCodeAnalysisContext(); context.Configuration.AutoDetectChangesEnabled = false; int count = 0; foreach (var fact in facts.Where(f => f != null)) { count++; Console.WriteLine(count); context = ContextHelper.AddToContext(context, fact, count, 250, true); //context.AddToContext(context, fact, count, 250, true); } context.SaveChanges(); } finally { if (context != null) context.Dispose(); } scope.Complete(); } Console.WriteLine("batch {0} finished continuing", batch); // continuing the batch batch++; skip = batchSize * (batch - 1); doBatch = skip < total; // AFTER THIS facts AND toProcess SHOULD BE RESET // BUT IT LOOKS LIKE THEY ARE NOT OR AT LEAST SOMETHING // IS GROWING IN MEMORY } Console.WriteLine("Processing is done {} recons processed", processed); } 

A method provided by Slauma to optimize a volume insert with an entity framework.

 class ContextHelper { public static ReceiptsItemCodeAnalysisContext AddToContext(ReceiptsItemCodeAnalysisContext context, ReconFact entity, int count, int commitCount, bool recreateContext) { context.Set<ReconFact>().Add(entity); if (count % commitCount == 0) { context.SaveChanges(); if (recreateContext) { context.Dispose(); context = new ReceiptsItemCodeAnalysisContext(); context.Configuration.AutoDetectChangesEnabled = false; } } return context; } } 
+6
source share
2 answers

Try to specify the object context not to track objects, for example:

 static void Main(string[] args) { ReceiptsItemCodeAnalysisContext db = new ReceiptsItemCodeAnalysisContext(); var recon = db.Recons .AsNoTracking() // <---- add this .Where(r => r.Transacs.Where(t => t.ItemCodeDetails.Count > 0).Count() > 0) .OrderBy( r => r.ReconNum); //... 

In the code that you have, all millions of Recon objects will accumulate in memory until the object context is deleted.

+3
source

Since you have the same data context during your startup, it is supposedly cached. Generally speaking, when I ran into this problem, it was easier for me to see that each โ€œbatchโ€ had its own datacontext, which goes beyond the scope of the iteration.

+1
source

Source: https://habr.com/ru/post/956879/


All Articles