Parallel .For System.OutOfMemoryException

We have a fairly simple program that is used to create backups. I try to parallelize it, but I get an OutOfMemoryException in an AggregateException. Some of the source folders are quite large, and the program will not work for about 40 minutes after its launch. I don’t know where to start looking, so the code below is an almost exact dump of the whole code: the directory structure of the sans code and the Exception registration code. Any advice on where to start looking?

using System; using System.Diagnostics; using System.IO; using System.Threading.Tasks; namespace SelfBackup { class Program { static readonly string[] saSrc = { "\\src\\dir1\\", //... "\\src\\dirN\\", //this folder is over 6 GB }; static readonly string[] saDest = { "\\dest\\dir1\\", //... "\\dest\\dirN\\", }; static void Main(string[] args) { Parallel.For(0, saDest.Length, i => { try { if (Directory.Exists(sDest)) { //Delete directory first so old stuff gets cleaned up Directory.Delete(sDest, true); } //recursive function clsCopyDirectory.copyDirectory(saSrc[i], sDest); } catch (Exception e) { //standard error logging CL.EmailError(); } }); } } /////////////////////////////////////// using System.IO; using System.Threading.Tasks; namespace SelfBackup { static class clsCopyDirectory { static public void copyDirectory(string Src, string Dst) { Directory.CreateDirectory(Dst); /* Copy all the files in the folder If and when .NET 4.0 is installed, change Directory.GetFiles to Directory.Enumerate files for slightly better performance.*/ Parallel.ForEach<string>(Directory.GetFiles(Src), file => { /* An exception thrown here may be arbitrarily deep into this recursive function there also a good chance that if one copy fails here, so too will other files in the same directory, so we don't want to spam out hundreds of error e-mails but we don't want to abort all together. Instead, the best solution is probably to throw back up to the original caller of copy directory an move on to the next Src/Dst pair by not catching any possible exception here.*/ File.Copy(file, //src Path.Combine(Dst, Path.GetFileName(file)), //dest true);//bool overwrite }); //Call this function again for every directory in the folder. Parallel.ForEach(Directory.GetDirectories(Src), dir => { copyDirectory(dir, Path.Combine(Dst, Path.GetFileName(dir))); }); } } 

Threads debug window displays 417 worker threads during an exception.

EDIT: copying is from one server to another. I am now trying to run the code with the latest Paralell.ForEach changed to normal foreach.

+1
source share
1 answer

Let’s make a few guesses here, since I haven’t yet had feedback with a comment on your question.

I assume that a large number of workflows occur here as actions (an action, which is a unit of work performed on a parallel foreach), takes longer than a certain amount of time, so the main ThreadPool increases the number of threads. This will happen because ThreadPool follows the pool growth algorithm so that new tasks are not blocked by existing long-term tasks, for example. if all my current threads are busy for half a second, I will start adding more threads to the pool. However, you will have problems if all the tasks are lengthy, and the new tasks that you add will make existing tasks work even longer. This is why you probably see a large number of worker threads - possibly due to disk overclocking or slow network I / O (if network drives are involved).

I also assume that files are copied from one drive to another, or copied from one place to another on one drive. In this case, adding threads to the problem will not help much. Source and target disks have only one set of heads, so an attempt to get them to perform several actions at the same time will most likely slow down their work:

  • Disc heads will crack everywhere.
  • The cache on your \ OS drive can often be invalid.

This may not be a big problem for parallelization.

Update

In response to your comment, if you get acceleration using multiple threads on smaller data sets, you can experiment with decreasing the maximum number of threads used in your parallel foreach, for example.

 ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 2 }; Parallel.ForEach(Directory.GetFiles(Src), options, file => { //Do stuff }); 

But please keep in mind that a disk failure may negate any benefits of parallelization in the general case. Play with it and evaluate your results.

+2
source

Source: https://habr.com/ru/post/1272650/


All Articles