Interlocked.Exchange <T> is slower than Interlocked.CompareExchange <T>?

I came across some odd performance results when optimizing the program, which are shown in the following BenchmarkDotNet test:

string _s, _y = "yo"; [Benchmark] public void Exchange() => Interlocked.Exchange(ref _s, null); [Benchmark] public void CompareExchange() => Interlocked.CompareExchange(ref _s, _y, null); 

The results are as follows:

 BenchmarkDotNet=v0.10.10, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192) Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8 Frequency=2531248 Hz, Resolution=395.0620 ns, Timer=TSC .NET Core SDK=2.1.4 [Host] : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT DefaultJob : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT Method | Mean | Error | StdDev | ---------------- |----------:|----------:|----------:| Exchange | 20.525 ns | 0.4357 ns | 0.4662 ns | CompareExchange | 7.017 ns | 0.1070 ns | 0.1001 ns | 

It would seem that Interlocked.Exchange more than twice as slow as Interlocked.CompareExchange , which is confusing because it should work less. If I am not mistaken, both should be processors.

Does anyone have a good explanation why this could happen? Is this the actual difference in performance on processors, or is there any problem with how .NET Core wraps them?

If so, is it best to just avoid Interlocked.Exchange() and use Interlocked.CompareExchange() when possible?

EDIT: Another weird thing: when I run the same tests with int or long, and not with a string, I get more or less the same runtime. In addition, I used the BenchmarkDotNet diagnostic analyzer to look at the assembly that was generated and found something interesting: with the int / long version, I can clearly see the xchg and cmpxchg instructions, but with the lines I see a call in Interlocked.Exchange/ Interlocked CompareExchange methods ...!

EDIT2: Open issue in coreclr: https://github.com/dotnet/coreclr/issues/16051

+5
source share
1 answer

Following my comments, this seems to be a problem with the overall Exchange overload.

If you avoid general overload at all (changing the type of _s and _y to object ), the performance difference disappears.

The question remains, why the decision about general congestion only slows down Exchange . After reading the Interlocked source code, it seems that hack was implemented in CompareExchange<T> to make it faster. Source code comments on CompareExchange<T> follow:

  * CompareExchange<T> * * Notice how CompareExchange<T>() uses the __makeref keyword * to create two TypedReferences before calling _CompareExchange(). * This is horribly slow. Ideally we would like CompareExchange<T>() * to simply call CompareExchange(ref Object, Object, Object); * however, this would require casting a "ref T" into a "ref Object", * which is not legal in C#. * * Thus we opted to cheat, and hacked to JIT so that when it reads * the method body for CompareExchange<T>() it gets back the * following IL: * * ldarg.0 * ldarg.1 * ldarg.2 * call System.Threading.Interlocked::CompareExchange(ref Object, Object, Object) * ret * * See getILIntrinsicImplementationForInterlocked() in VM\JitInterface.cpp * for details. 

Nothing of the kind has been commented out on Exchange<T> , and it also uses the __makeref horribly slow __makeref , so that might be the reason you see this unexpected behavior.

All of this, of course, is my interpretation, you really need someone from the .NET team to really confirm my suspicions.

+7
source

Source: https://habr.com/ru/post/1274976/


All Articles