What will I specify as the Dop parameter for the ForEachAsync extension method?

I recently discovered the following code below to efficiently run many IO related tasks (see link). http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx

I get the impression that the following is true:

  • this is much better than using Parallel.ForEach, because the work is not CPU related.
  • ForEachAsync will help in the sequence of as many I / O tasks as possible (without the need to host them on separate threads).
  • TPL will “know” these are I / O-based tasks, rather than deploying more threads, instead using a task callback / completion source to return the signal to the main stream, thereby saving the overhead of switching the context of the stream.

My question is that Parallel.ForEach has its own MaxDegreeOfParallelism, how to determine how to define the dop parameter here in the example IEnumerable extension code?

eg. If I have 1000 elements to process and you need to make a db call based on the SQL server for each element, would I define 1000 as dop ? With Parallel.ForEach, it is used as a limiter to prevent too many threads that can damage performance. But here it seems to be used to split the minimum number of asynchronous tasks. I think that at least there should not be a maximum (the minimum is that all elements are processed), because I want to queue as many IO-based queries in the database as possible.

How do I know what can be seen in the DOP parameter?

public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body) { return Task.WhenAll( from partition in Partitioner.Create(source).GetPartitions(dop) select Task.Run(async delegate { using (partition) while (partition.MoveNext()) await body(partition.Current); })); } 
+5
source share
1 answer

Parallel.ForEach has its own MaxDegreeOfParallelism

OK, the heuristic built into Parallel.ForEach is very prone to a huge number of tasks over time (if your work items have a delay of 10 ms, you get hundreds of tasks in an hour or so - I measured it). Really terrible design flaw, don't try to imitate it.

With parallel IOs, there is no alternative to empirically determining the correct value. That is why TPL is so bad in it. For example, magnetic disks performing serial IO have DOP 1. An SSD makes random sympathies mostly endless (100?).

The remote web service does not let you know the correct DOP. You not only need to test, you need to ask the owner to allow spam services with requests that can overload it.

I would define 1000 as extra?

Then you do not need this tool at all. Just create all the tasks and then wait for them all. But 1000 is most likely an erroneous DOP because it overloads the database without any advantages.

here it is used to split the minimum number of asynchronous tasks

Another terrible feature of Parallel.For . On computers with a low processor, it can appear on small tasks. Awful API. Do not use it with IO. (I use AsParallel , which allows you to set the exact DOP, not the maximum DOP.)

because I want as many IO-based database queries as possible

Why? Bad plan.


Btw, the method you posted here is good, and I also use this. I would like it to be within the framework. This exact method is the answer to the question of 10 SO per week ("How can I asynchronously process 100,000 elements in parallel?").

+3
source

Source: https://habr.com/ru/post/1234493/


All Articles