Client side .NET concurrency performance

I am writing a .NET client application that is expected to use a lot of threads. I was warned that .NET performance is very bad when it comes to concurrency. Although I am not writing an application in real time, I want to make sure that my application is scalable (i.e. it allows many threads) and is somehow comparable with the equivalent C ++ application.

What is your experience? What is a suitable reference?

+4
source share
5 answers

I put together a quick and dirty benchmark in C # using a simple generator as a test. The test generates primes to a constant limit (I chose 500,000) using a simple implementation of Sieve Eratosthenes and repeats the test 800 times, parallelizing over a certain number of threads, either using .NET ThreadPool or stand-alone threads.

The test was performed on a Q6600 quad-core processor running Windows Vista (x64). It does not use a parallel task library, just a simple theme. It was launched for the following scenarios:

  • Sequential execution (no streaming)
  • 4 threads (i.e. one per core) using ThreadPool
  • 40 threads using ThreadPool (to check the effectiveness of the pool itself)
  • 4 autonomous streams
  • 40 standalone threads to simulate context switching

Results:

 Test | Threads | ThreadPool | Time -----+---------+------------+-------- 1 | 1 | False | 00:00:17.9508817 2 | 4 | True | 00:00:05.1382026 3 | 40 | True | 00:00:05.3699521 4 | 4 | False | 00:00:05.2591492 5 | 40 | False | 00:00:05.0976274 

Conclusions can be drawn from this:

  • Parallelization is not perfect (as expected - it never happens, regardless of the environment), but sharing the load on 4 cores leads to about 3.5 times more bandwidth, which is hardly worth complaining about.

  • There was a slight difference between 4 and 40 threads using ThreadPool , which means that there is no significant cost to the pool, even if you trick it with requests.

  • There was a slight difference between the versions of ThreadPool and free-threaded, which means that ThreadPool does not have significant "fixed" costs;

  • There was a slight difference between the versions with 4-stream and 40-stream files with the free version, which means that .NET does nothing worse than you would expect from it with intensive context switching.

Do we even need a C ++ test for comparison? The results are pretty clear: the threads in .NET are not slow. If you , a programmer, write poor multi-threaded code and end up with hunger resources or blocking convoys, you really don't need to worry.

With .NET 4.0 and TPL and improvements in ThreadPool , queues for work and all that is cool, you have even more opportunities for writing "doubtful" code, and yet it works efficiently. You do not get these features at all from C ++.

For reference, here is the test code:

 using System; using System.Collections.Generic; using System.Diagnostics; using System.Runtime.CompilerServices; using System.Threading; namespace ThreadingTest { class Program { private static int PrimeMax = 500000; private static int TestRunCount = 800; static void Main(string[] args) { Console.WriteLine("Test | Threads | ThreadPool | Time"); Console.WriteLine("-----+---------+------------+--------"); RunTest(1, 1, false); RunTest(2, 4, true); RunTest(3, 40, true); RunTest(4, 4, false); RunTest(5, 40, false); Console.WriteLine("Done!"); Console.ReadLine(); } static void RunTest(int sequence, int threadCount, bool useThreadPool) { TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool)); Console.WriteLine("{0} | {1} | {2} | {3}", sequence.ToString().PadRight(4), threadCount.ToString().PadRight(7), useThreadPool.ToString().PadRight(10), duration); } static TimeSpan Time(Action action) { Stopwatch sw = new Stopwatch(); sw.Start(); action(); sw.Stop(); return sw.Elapsed; } static void GeneratePrimes(int threadCount, bool useThreadPool) { if (threadCount == 1) { TestPrimes(TestRunCount); return; } int testsPerThread = TestRunCount / threadCount; int remaining = threadCount; using (ManualResetEvent finishedEvent = new ManualResetEvent(false)) { for (int i = 0; i < threadCount; i++) { Action testAction = () => { TestPrimes(testsPerThread); if (Interlocked.Decrement(ref remaining) == 0) { finishedEvent.Set(); } }; if (useThreadPool) { ThreadPool.QueueUserWorkItem(s => testAction()); } else { ThreadStart ts = new ThreadStart(testAction); Thread th = new Thread(ts); th.Start(); } } finishedEvent.WaitOne(); } } [MethodImpl(MethodImplOptions.NoOptimization)] static void IteratePrimes(IEnumerable<int> primes) { int count = 0; foreach (int prime in primes) { count++; } } static void TestPrimes(int testRuns) { for (int t = 0; t < testRuns; t++) { var primes = Primes.GenerateUpTo(PrimeMax); IteratePrimes(primes); } } } } 

And here is the main generator:

 using System; using System.Collections.Generic; using System.Linq; namespace ThreadingTest { public class Primes { public static IEnumerable<int> GenerateUpTo(int maxValue) { if (maxValue < 2) return Enumerable.Empty<int>(); bool[] primes = new bool[maxValue + 1]; for (int i = 2; i <= maxValue; i++) primes[i] = true; for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++) { if (primes[i]) { for (int j = i * i; j <= maxValue; j += i) primes[j] = false; } } return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]); } } } 

If you see any obvious flaws in the test, let me know. Given any serious problems with the test itself, I think the results speak for themselves, and the message is clear:

Do not listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language / environment is โ€œbadโ€ in any particular area, because they are probably talking about their ... back ends.

+12
source

You might want to take a look at System.Threading.Tasks introduced in .NET 4.

They presented a scalable way to use threads with a task with some really cool work sharing mechanism.

By the way, I donโ€™t know who told you that .NET is not suitable for concurrency. All my applications really use threads at some point in the other, but don't forget that having 10 threads in a 2-core processor is a performance counter (depending on the type of task you are doing. Waiting for network resources, then it might make sense )

In any case, do not be afraid of .NET for performance, this is actually not bad.

+9
source

This is a myth ..NET does a great job of managing concurrency and is very scalable.

If you can, I would recommend using .NET 4 and a parallel task library. This simplifies many concurrency issues. For more information, I would recommend looking at the MSDN Center for Parallel Computing with managed code .

If you are interested in implementation details, I also have a very detailed series on Parallelism in .NET .

+7
source

The performance of .NET on concurrency will be pretty close to that of applications written in native code. System.Threading is a very thin layer on top of the threading API.

Someone who warns you may notice that since multi-threaded applications are much easier to write in .NET, they are sometimes written by less experienced programmers who do not fully understand concurrency, but this is not a technical limitation.

If anecdotal data helps, at my last job we wrote a highly parallel trading application that processed more than 20,000 market data events per second and updated the massive โ€œmain formโ€ grid with the corresponding data through a rather massive stream architecture and everything in C # and VB .NET Due to the complexity of the application, we have optimized many areas, but have never seen the benefits of rewriting thread code in native C ++.

+4
source

You should seriously consider first whether you need a lot of threads or just some. This is not that .NET threads are slow. The threads are slow. Switching tasks is an expensive operation, regardless of who wrote the algorithm.

This is a place, like many others, where design patterns can help. There are already good answers that relate to this fact, so Iโ€™ll just make it explicit. You better use a command template to work with multiple workflows, and then do this work as quickly as possible in a sequence than you try to expand a bunch of threads and do a bunch of work in "parallel", which is actually not done in parallel, but rather divided into small pieces that are woven together by the planner.

In other words: you are better off dividing the work into pieces of value, using your mind and knowledge to decide where the boundaries between units of value live, than you can solve some general solution, such as an operating system.

+3
source

Source: https://habr.com/ru/post/1304800/


All Articles