I put together a quick and dirty benchmark in C # using a simple generator as a test. The test generates primes to a constant limit (I chose 500,000) using a simple implementation of Sieve Eratosthenes and repeats the test 800 times, parallelizing over a certain number of threads, either using .NET ThreadPool or stand-alone threads.
The test was performed on a Q6600 quad-core processor running Windows Vista (x64). It does not use a parallel task library, just a simple theme. It was launched for the following scenarios:
- Sequential execution (no streaming)
- 4 threads (i.e. one per core) using
ThreadPool - 40 threads using
ThreadPool (to check the effectiveness of the pool itself) - 4 autonomous streams
- 40 standalone threads to simulate context switching
Results:
Test | Threads | ThreadPool | Time -----+---------+------------+-------- 1 | 1 | False | 00:00:17.9508817 2 | 4 | True | 00:00:05.1382026 3 | 40 | True | 00:00:05.3699521 4 | 4 | False | 00:00:05.2591492 5 | 40 | False | 00:00:05.0976274
Conclusions can be drawn from this:
Parallelization is not perfect (as expected - it never happens, regardless of the environment), but sharing the load on 4 cores leads to about 3.5 times more bandwidth, which is hardly worth complaining about.
There was a slight difference between 4 and 40 threads using ThreadPool , which means that there is no significant cost to the pool, even if you trick it with requests.
There was a slight difference between the versions of ThreadPool and free-threaded, which means that ThreadPool does not have significant "fixed" costs;
There was a slight difference between the versions with 4-stream and 40-stream files with the free version, which means that .NET does nothing worse than you would expect from it with intensive context switching.
Do we even need a C ++ test for comparison? The results are pretty clear: the threads in .NET are not slow. If you , a programmer, write poor multi-threaded code and end up with hunger resources or blocking convoys, you really don't need to worry.
With .NET 4.0 and TPL and improvements in ThreadPool , queues for work and all that is cool, you have even more opportunities for writing "doubtful" code, and yet it works efficiently. You do not get these features at all from C ++.
For reference, here is the test code:
using System; using System.Collections.Generic; using System.Diagnostics; using System.Runtime.CompilerServices; using System.Threading; namespace ThreadingTest { class Program { private static int PrimeMax = 500000; private static int TestRunCount = 800; static void Main(string[] args) { Console.WriteLine("Test | Threads | ThreadPool | Time"); Console.WriteLine("-----+---------+------------+--------"); RunTest(1, 1, false); RunTest(2, 4, true); RunTest(3, 40, true); RunTest(4, 4, false); RunTest(5, 40, false); Console.WriteLine("Done!"); Console.ReadLine(); } static void RunTest(int sequence, int threadCount, bool useThreadPool) { TimeSpan duration = Time(() => GeneratePrimes(threadCount, useThreadPool)); Console.WriteLine("{0} | {1} | {2} | {3}", sequence.ToString().PadRight(4), threadCount.ToString().PadRight(7), useThreadPool.ToString().PadRight(10), duration); } static TimeSpan Time(Action action) { Stopwatch sw = new Stopwatch(); sw.Start(); action(); sw.Stop(); return sw.Elapsed; } static void GeneratePrimes(int threadCount, bool useThreadPool) { if (threadCount == 1) { TestPrimes(TestRunCount); return; } int testsPerThread = TestRunCount / threadCount; int remaining = threadCount; using (ManualResetEvent finishedEvent = new ManualResetEvent(false)) { for (int i = 0; i < threadCount; i++) { Action testAction = () => { TestPrimes(testsPerThread); if (Interlocked.Decrement(ref remaining) == 0) { finishedEvent.Set(); } }; if (useThreadPool) { ThreadPool.QueueUserWorkItem(s => testAction()); } else { ThreadStart ts = new ThreadStart(testAction); Thread th = new Thread(ts); th.Start(); } } finishedEvent.WaitOne(); } } [MethodImpl(MethodImplOptions.NoOptimization)] static void IteratePrimes(IEnumerable<int> primes) { int count = 0; foreach (int prime in primes) { count++; } } static void TestPrimes(int testRuns) { for (int t = 0; t < testRuns; t++) { var primes = Primes.GenerateUpTo(PrimeMax); IteratePrimes(primes); } } } }
And here is the main generator:
using System; using System.Collections.Generic; using System.Linq; namespace ThreadingTest { public class Primes { public static IEnumerable<int> GenerateUpTo(int maxValue) { if (maxValue < 2) return Enumerable.Empty<int>(); bool[] primes = new bool[maxValue + 1]; for (int i = 2; i <= maxValue; i++) primes[i] = true; for (int i = 2; i < Math.Sqrt(maxValue + 1) + 1; i++) { if (primes[i]) { for (int j = i * i; j <= maxValue; j += i) primes[j] = false; } } return Enumerable.Range(2, maxValue - 1).Where(i => primes[i]); } } }
If you see any obvious flaws in the test, let me know. Given any serious problems with the test itself, I think the results speak for themselves, and the message is clear:
Do not listen to anyone who makes overly broad and unqualified statements about how the performance of .NET or any other language / environment is โbadโ in any particular area, because they are probably talking about their ... back ends.