Running Parallel.Foreach Work on Multiple Threads

Question

Running Parallel.Foreach Work on Multiple Threads

I have 3 main processing threads, each of which performs operations on ConcurrentDictionaries values using Parallel.Foreach. Dictionaries vary in size from 1,000 elements to 250,000 elements.

TaskFactory factory = new TaskFactory(); Task t1 = factory.StartNew(() => { Parallel.ForEach(dict1.Values, item => ProcessItem(item)); }); Task t2 = factory.StartNew(() => { Parallel.ForEach(dict2.Values, item => ProcessItem(item)); }); Task t3 = factory.StartNew(() => { Parallel.ForEach(dict3.Values, item => ProcessItem(item)); }); t1.Wait(); t2.Wait(); t3.Wait();

I compared the performance (total runtime) of this construct by simply running Parallel.Foreach in the main thread, and the performance improved significantly (runtime was reduced about 5 times)

My questions:

Is there something wrong with the approach above? If so, what and how can be improved?
What is the reason for the different deadlines?
What is a good way to debug and analyze this situation?

EDIT . To clarify the situation once again: I am mocking client calls in the WCF service, each of which goes on its own thread (the reason for the tasks). I also tried using ThreadPool.QueueUserWorkItem instead of a task without performance improvement. Objects in the dictionary have from 20 to 200 properties (only decimal numbers and strings), and there is no I / O activity

I solved the problem by queuing processing requests in a BlockingCollection and processing them one at a time

+5

performance c # parallel-processing .net

anchandra Mar 22 '11 at 22:16

source share

3 answers

First of all, Task is not a Stream.

Your calls to Parallel.ForEach() are made by a scheduler that uses ThreadPool and should try to optimize thread usage. ForEach applies the Separator. When you run them in parallel, they cannot coordinate well.

Only if there is a performance issue, consider helping with additional tasks or DegreeOfParallelism directives. And then always profile and analyze first.

It’s difficult to explain your results, it can be caused by many factors (for example, input / output), but the advantage of the “only main task” is that the scheduler has more control, and the processor and cache are used better (locality),

+3

Henk holterman Mar 22 '11 at 10:55

source share

Dictionaries vary greatly in size and in appearance (provided that everything ends in <5s), the amount of processing is small. Without knowing more, it’s hard to say what really happens. How big are your vocabulary entries? The main thread scenario that you are comparing looks like this:

 Parallel.ForEach(dict1.Values, item => ProcessItem(item)); Parallel.ForEach(dict2.Values, item => ProcessItem(item)); Parallel.ForEach(dict3.Values, item => ProcessItem(item));

By adding tasks around each ForEach parameter, you add extra overhead to manage tasks and probably cause a memory conflict like dict1, dict2 and dict3, all simultaneously trying to work both in memory and in the hot cache. Remember that CPU cycles are cheap, cache misses are not.

+2

Ade miller Mar 22 '11 at 10:53

source share

digEmAll · Accepted Answer · 2011-03-22T22:31:29+0000

You are probably too parallel.

You do not need to create 3 tasks if you already use good (and balanced) parallelization inside each of them.

Parallel.Foreach already trying to use the right amount of threads to use the full potential of the CPU without saturating it. And by creating other tasks that have Parallel.Foreach , you are likely to saturate it.
(EDIT: as Henk said , they probably have some problems with coordinating the number of threads that appear when running in parallel, and at least this leads to a lot of overhead).

Check here for some tips.

Running Parallel.Foreach Work on Multiple Threads

More articles: