C # mutable variable: VS. Cufflinks caching

So, I studied this topic for quite some time, and I think I understand the most important concepts, such as release and memory collection .

However, I did not find a satisfactory explanation of the relationship between volatile and main memory caching.

So, I understand that each field for reading and writing to / from volatile provides a strict reading order, as well as the write operations that precede and follow it (read-write and write-release). But this only guarantees the order of operations. It does not say anything about the time when these changes are visible to other threads / processors. In particular, it depends on the cache flushing time (if at all). I remember that I read Eric Lippert's comment, which states that "the presence of volatile fields automatically disables cache optimization." But I'm not sure what exactly this means. Does this mean that caching is completely disabled for the entire program just because we have one volatile field somewhere? If not, what is the cache granularity for?

In addition, I read something about strong and weak volatile semantics and that C # follows strong semantics, where each entry will always go directly to the main memory, regardless of whether this field is volatile or not. I am very confused about all this.

+5
source share
3 answers

First, I will consider the last question. The Microsoft.NET implementation has release semantics for writing 1 . It is not C # per se, therefore the same program, regardless of the language, in another implementation may have a weak non-volatile record.

the visibility of side effects refers to several streams. Forget about processors, cores and caches. Imagine that each thread has a snapshot of what is in the heap, which requires some kind of synchronization to transfer side effects between the threads.

So what does C # say? the C # language specification ( newer draft ) speaks fundamentally similar to the Common Language Infrastructure standard (CLI; ECMA-335 and ISO / IEC 23271 ) with some differences. I will talk about them later.

So what does the CLI say? That only mutable operations are visible side effects.

Please note that it also states that non-volatile operations on the heap are also side effects, but are not guaranteed to be visible. Similarly, it is important 2 he does not declare that they are not guaranteed.

What exactly happens with volatile operations? Volatile reading acquires semantics; it precedes any subsequent memory reference. Flying record has release semantics; it follows any previous reference to memory.

Acquiring a lock performs volatile reads and releasing a lock performs volatile writes.

Operations

Interlocked have the semantics of receiving and releasing.

There is another important term to study, which is atomicity .

Reading and writing, volatile or not, are guaranteed to be atomic on primitive values ​​up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They are also guaranteed to be atomic for links. For other types, such as long struct s, the operations are not atomic; they may require several independent accesses to memory.

However, even with unstable semantics, read-modify-write operations, such as v += 1 or equivalent ++v (or v++ , in terms of side effects), are not atomic.

Blocked operations guarantee atomicity for certain operations, usually addition, subtraction and comparison and exchange (CAS), i.e. write some value if and only if the current value is still some expected value .. NET also has an atomic Read(ref long) method for as many as 64 bits, which works even in 32-bit architectures.

I will continue to refer to the semantics of getting as volatile readings and the semantics of releasing as volatile entries, as well as one or both as volatile operations.

What does all this mean in terms of order ?

That volatile reading is the point before which no memory references can intersect, and volatile reading is the point after which memory references cannot intersect, both at the language level and at the machine level.

These non-volatile operations may intersect after the next volatile readings, if there are no volatile records between them and intersect before the previous volatile readings, if there are no volatile readings between them.

These volatile operations in a stream are sequential and cannot be reordered.

These volatile operations in the stream become visible to all other threads in the same order. However, there is no complete order of volatile operations from all threads, that is, if one thread executes V1 and then V2, and another thread executes V3 and then V4, then any order that has V1 to V2 and V3 to V4. can be observed by any thread. In this case, it may be one of the following:

  • V1 V2 V3 V4 V1 V2 V3 V4

  • V1 V3 V2 V4 V1 V3 V2 V4

  • V1 V3 V4 V2 V1 V3 V4 V2

  • V3 V1 V2 V4 V3 V1 V2 V4

  • V3 V1 V4 V2 V3 V1 V4 V2

  • V3 V4 V1 V2 V3 V4 V1 V2

That is, any possible order of observed side effects is valid for any thread for one run. There is no general order requirement, so all threads follow only one of the possible orders for one execution.

How are things synchronized?

Essentially, it comes down to the following: a synchronization point is where you have a volatile read that occurs after a volatile write.

In practice, you need to find out if the volt-ampere reading in one thread happened after a volatile write in another thread 3 . Here is a basic example:

 public class InefficientEvent { private volatile bool signalled = false; public Signal() { signalled = true; } public InefficientWait() { while (!signalled) { } } } 

However, it’s usually inefficient, you can start two different threads, so one calls InefficientWait() and the other calls Signal() , and the side effects of the latter when it returns from Signal() become visible to the first when it returns with InefficientWait() .

Volatile calls are not as commonly used as blocked calls, which are not as useful as synchronization primitives. My advice is that you should first develop the code using, if necessary, synchronization primitives (locks, semaphores, mutexes, events, etc.), and if you find reasons for improving performance based on actual data (e.g. profiling) , then and only then see if you can improve.

If you ever achieve high competition for fast locks (used only for a few reads and writes without locking), depending on the number of conflicts, switching to locked operations can either improve or decrease performance. Especially when you have to resort to comparison and swap cycles, for example:

 var currentValue = Volatile.Read(ref field); var newValue = GetNewValue(currentValue); var oldValue = currentValue; var spinWait = new SpinWait(); while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue) { spinWait.SpinOnce(); newValue = GetNewValue(currentValue); oldValue = currentValue; } 

Meaning, you should also profile the solution and compare it with the current state. And be aware of the ABA problem .

There's also SpinLock , which you really have to profile for monitor-based locks, because although they can make the current thread, they don't put the current thread in sleep, akin to the use of SpinWait shown.

Switching to unstable operations is a game with fire. You need to make sure that there is analytical evidence that your code is correct, otherwise you may get burned when you least expect it.

Usually the best approach to optimization in the event of high competition is to prevent competition. For example, for parallel conversion in a large list, it is often better to split and delegate the problem to several work items that generate the results that are combined in the last step, rather than having multiple threads blocking the list for updates. This has a memory cost, so it depends on the length of the data set.


What are the differences between the C # specification and the CLI specification regarding volatile operations?

C # indicates side effects, not to mention their visibility between threads, like reading or writing a variable field, writing to a non-volatile variable, writing to an external resource, and throwing exceptions.

C # indicates critical execution points at which these side effects persist between threads: references to mutable fields, lock statements, thread creation and termination.

If we take critical execution points as points where side effects become visible , it adds to the CLI specification that creating and terminating threads are visible side effects, i.e. new Thread(...).Start() has the release semantics in the current thread and acquires the semantics at the beginning of the new thread, and exiting the thread has the release semantics in the current thread, and thread.Join() has the semantics in the waiting thread.

C # does not mention volatile operations in general, for example, performed by classes in System.Threading , and not just using fields declared as volatile , and using the lock statement. I believe that this is not intentional.

C # states that captured variables can be exposed to multiple threads at the same time. CIL does not mention this since closures are a language construct.


1.

There are several places where Microsoft (ex-) employees and MVP state that records have release semantics:

In my code, I ignore this implementation detail. I believe that a non-volatile recording will not become visible.


2.

There is a common misconception that you are allowed to enter reads in C # and / or CLI.

However, this is only true for local arguments and variables.

For static fields and fields of an instance or arrays or anything from the heap, you cannot safely enter reads, since such an introduction can disrupt the execution order, as can be seen from the current thread of execution, or from legitimate changes in other threads, or from changes through reflection.

That is, you cannot do this:

 object local = field; if (local != null) { // code that reads local } 

in it:

 if (field != null) { // code that replaces reads on local with reads on field } 

if you can ever tell the difference. In particular, a NullReferenceException thrown by accessing local members.

In the case of captured C # variables, they are equivalent to instance fields.

It is important to note that the CLI standard:

  • says unstable appeals are not guaranteed to be visible

  • does not mean that guaranteed non-volatile handling will not be visible

  • says volatile handling affects non-volatility access visibility

But you can do it:

 object local2 = local1; if (local2 != null) { // code that reads local2 on the assumption it not null } 

in it:

 if (local1 != null) { // code that replaces reads on local2 with reads on local1, // as long as local1 and local2 have the same value } 

You can enable this:

 var local = field; local?.Method() 

in it:

 var local = field; var _temp = local; (_temp != null) ? _temp.Method() : null 

or that:

 var local = field; (local != null) ? local.Method() : null 

because you can never tell the difference. But again, you cannot turn this into this:

 (field != null) ? field.Method() : null 

I believe that in both specifications it was reasonable to argue that the optimizing compiler can reorder , read and write if only one thread of execution observes them as written, rather than usually input and eliminating them altogether.

Please note that reading the exception can be performed either by the C # compiler or the JIT compiler, i.e. several readings in the same non-volatile field, separated by instructions that are not written to this field, which do not perform mutable operations or the equivalent, can be minimized up to one reading. It is as if the thread never syncs with other threads, so it saves the same value:

 public class Worker { private bool working = false; private bool stop = false; public void Start() { if (!working) { new Thread(Work).Start(); working = true; } } public void Work() { while (!stop) { // TODO: actual work without volatile operations } } public void Stop() { stop = true; } } 

There is no guarantee that Stop() will stop the employee. The implementation of Microsoft.NET ensures that stop = true; is a visible side effect, but this does not guarantee that reading on stop inside Work() not eliminated:

  public void Work() { bool localStop = stop; while (!localStop) { // TODO: actual work without volatile operations } } 

This comment says quite a lot. To perform this optimization, the compiler must prove that there are no mutable operations, either directly in the block, or indirectly in all methods and properties of the call tree.

In this particular case, one correct implementation is to declare stop as volatile . But there are more options, such as using the equivalent Volatile.Read and Volatile.Write , using Interlocked.CompareExchange , using the lock statement around access to stop , using something equivalent to locking, like Mutex , or Semaphore and SemaphoreSlim if you don't want to so that the lock is bound to a thread, that is, you can release it in a different thread than the one that acquired it, or instead of ManualResetEvent or ManualResetEventSlim stop , in which case you can make Work() sleep with a timeout, waiting for a stop signal before the next iteration, etc.


3.

One of the significant differences in .NET timing synchronization compared to Java volatile synchronization is that Java requires you to use the same unstable location, whereas for .NET only what happens after release (volatile write) receipt (volatile reading). So, in principle, you can synchronize in .NET with the following code, but you cannot synchronize with equivalent code in Java:

 using System; using System.Threading; public class SurrealVolatileSynchronizer { public volatile bool v1 = false; public volatile bool v2 = false; public int state = 0; public void DoWork1(object b) { var barrier = (Barrier)b; barrier.SignalAndWait(); Thread.Sleep(100); state = 1; v1 = true; } public void DoWork2(object b) { var barrier = (Barrier)b; barrier.SignalAndWait(); Thread.Sleep(200); bool currentV2 = v2; Console.WriteLine("{0}", state); } public static void Main(string[] args) { var synchronizer = new SurrealVolatileSynchronizer(); var thread1 = new Thread(synchronizer.DoWork1); var thread2 = new Thread(synchronizer.DoWork2); var barrier = new Barrier(3); thread1.Start(barrier); thread2.Start(barrier); barrier.SignalAndWait(); thread1.Join(); thread2.Join(); } } 

This surreal example expects threads and Thread.Sleep(int) to take exact time. If so, it synchronizes correctly because DoWork2 performs volatile read (retrieval) after DoWork1 performs volatile write (release).

In Java, even with such realistic expectations, this would not guarantee synchronization. In DoWork2 you will need to read from the same modified field that you wrote in DoWork1 .

+6
source

I read the specifications and they do not say anything about whether the volatile EVER record will be watched by another thread (volatile reading or not). Is this right or wrong?

Let me rephrase the question:

Is it right that the specification says nothing about this?

No. The specification is very clear on this.

Is volatile recording guaranteed to be watched in another thread?

Yes, if the other thread has a critical execution point. It is guaranteed that a special side effect will be ordered relative to the critical point of execution.

Volatile recording is a special side effect, and some of them are critical execution points, including starting and stopping threads. See the specification for a list of such.

Suppose, for example, that an Alpha stream sets volatile int field v to one and starts a Bravo stream that reads v and then joins Bravo. (That is, the lock on Bravo is ending.)

At this moment, we have a special side effect: writing - a critical point of execution - the beginning of the thread - and the second special side effect - volatile reading. Therefore, Bravo is required to read one of v . (Assuming no other thread wrote this, meanwhile, of course.)

Now Bravo increases v by two and finishes. This is a special side effect - recording - and the critical point of execution is the end of the stream.

When the Alpha stream now resumes and performs a volatile read of v , it is required that it read two. (Assuming no other thread was written in this).

Streamlining the side effect of Bravo recording and Bravo interruption should be retained; it’s just that Alpha doesn’t start again until Bravo completes, and so you need to watch the recording.

+7
source

Yes, volatile about fences and fences about order. So when? It is not included in the scope and is actually part of the implementation of all layers (compiler, JIT, CPU, etc.), but each implementation should have a decent and practical answer to this question.

0
source

Source: https://habr.com/ru/post/1269097/


All Articles