A good threadpool is already trying to have one active thread per available core. It is not about one thread working on the kernel, although as if the thread were blocked (most classically on I / O), you need another thread using this kernel.
Trying to use the .NET threadpool instead may be worth a try or the Parallel class.
If your processor has hyperthreading (8 virtual cores per 4 physical), this can be a problem. Hyper threading makes things faster on average, but there are many times when it makes them worse. Try to establish proximity to any other core and see if it gives you improvements - if so, then this is most likely the case when hyperthreading is bad.
Do you need to collect results again or share any resources between different tasks? The cost of this may well be more than the saving of multithreading. Perhaps they are so unjustified that, for example, if you block shared data, but this data is only ever read, you do not need to read most data structures (most, but not all, are safe to read at the same time if there is no write).
Sharing work can also be a problem. Say a single-threaded approach paves its way through a memory region, but a multi-threaded approach gives each thread the next bit of memory to work with a cyclic loop. There will be more cache flushing per core since the βgood next bitβ is actually used by another core. In this situation, work with splitting into large pieces can be fixed.
There are many other factors that can make a multi-threaded approach worse than a single-threaded approach, but these are the few that I can think of right away.
Edit: if you write to a general store, itβs worth trying to run where you just throw away any results. This may narrow the question of where the problem is.
source share