Maximum Concurrency Choke

I expect that there will be many possible solutions to this. I can come up with a few of them, some of them are clearly better than others, but I'm sure that they are not optimal, so I'm interested in hearing from you real multi-threaded gurus,

I have about 100 works that can be performed simultaneously, since there are no dependencies between them. If I execute them sequentially, my total lead time is around 1:30. If I download every part of the work in the thread pool, it will take about 2 m, which tells me that I'm trying to do too much at once, and the context switch between all these threads denies the advantage of having these threads.

So, based on the assumption (please do not hesitate to shoot me, if this is not so) that if I only queue for the number of cores in my system (8 on this machine), then at any time I will reduce the context switch and thus increasing overall efficiency (other process threads fail, of course), can anyone suggest an optimal template / method for this?

BTW I use smartthreadpool.codeplex.com, but I do not need.

+4
source share
2 answers

A good threadpool is already trying to have one active thread per available core. It is not about one thread working on the kernel, although as if the thread were blocked (most classically on I / O), you need another thread using this kernel.

Trying to use the .NET threadpool instead may be worth a try or the Parallel class.

If your processor has hyperthreading (8 virtual cores per 4 physical), this can be a problem. Hyper threading makes things faster on average, but there are many times when it makes them worse. Try to establish proximity to any other core and see if it gives you improvements - if so, then this is most likely the case when hyperthreading is bad.

Do you need to collect results again or share any resources between different tasks? The cost of this may well be more than the saving of multithreading. Perhaps they are so unjustified that, for example, if you block shared data, but this data is only ever read, you do not need to read most data structures (most, but not all, are safe to read at the same time if there is no write).

Sharing work can also be a problem. Say a single-threaded approach paves its way through a memory region, but a multi-threaded approach gives each thread the next bit of memory to work with a cyclic loop. There will be more cache flushing per core since the β€œgood next bit” is actually used by another core. In this situation, work with splitting into large pieces can be fixed.

There are many other factors that can make a multi-threaded approach worse than a single-threaded approach, but these are the few that I can think of right away.

Edit: if you write to a general store, it’s worth trying to run where you just throw away any results. This may narrow the question of where the problem is.

+5
source

For me, what you say seems strange. Since, by definition, a thread pool should not use more available resources available to the system (i.e. if you have 4 cores, it will use 4 threads or something close to this number). It uses a queue from which workflows accept tasks and execute them. Thus, you cannot really overload the system if you use a thread pool, unless you manually specify the number of threads to use, which is not recommended in your case.

Have you tried using the standard C # ThreadPool class instead?

+1
source

Source: https://habr.com/ru/post/1385321/


All Articles