Windows Atomicity, Volatility, and Thread Safety

Question

Windows Atomicity, Volatility, and Thread Safety

This is my understanding of atomicity, which it used to make sure that the value will be read / written in whole, not in parts. For example, a 64-bit value, which actually represents two 32-bit DWORDs (suppose x86 is here), should be atomic if it is shared between threads, so both DWORDs are read / written at the same time. Thus, one thread cannot read half the variable, which is not updated. How do you guarantee atomicity?

In addition, I understand that volatility does not guarantee thread safety. It's true?

I have seen that this implies many places that are simply atomic / volatile, are thread safe. I do not understand how it is. Don't I need a memory barrier to ensure that any values, atomic or otherwise, are read / written before they can be guaranteed to be read / written in another thread?

So, for example, let's say I create a suspended stream, do some calculations to change some values in the structure available to the stream, and then resume, for example:

HANDLE hThread = CreateThread(NULL, 0, thread_entry, (void *)&data, CREATE_SUSPENDED, NULL); data->val64 = SomeCalculation(); ResumeThread(hThread);

I suppose this will depend on any memory barriers in ResumeThread? Should I do a mutual exchange for val64? What if the thread works, how can this change?

I'm sure I ask a lot here, but basically what I'm trying to understand is what I asked for in the title: a good explanation of the atomicity, volatility, and thread safety in Windows. Thanks

+6

multithreading windows thread-safety atomicity volatility

loop Jan 24 '15 at 7:37

source share

2 answers

In general, C and C ++ offer no guarantees as to how reading or writing a "volatile" object behaves in multi-threaded programs. (There is probably a “new” C ++ 11, because now it includes threads as part of the standard, but traditionally threads were not part of standard C or C ++.) Using volatile and accepting assumptions about atomicity and cache coherence in code, which designed to carry is a problem. This shit is shooting as to whether a particular compiler and platform will handle calls to "volatile" objects in a thread-safe manner.

General rule: "volatile" is not enough to provide secure access to the stream. You should use some mechanism provided by the platform (usually some functions or synchronization objects) for safe access to stream values.

Now, in particular, on Windows, in particular with the compiler VC ++ 2005+ and, in particular, on x86 and x64 systems, access to a primitive object (for example, int) can be made thread safe if:

On 64-bit and 32-bit Windows, the object must be 32-bit, and it must be 32-bit.
On 64-bit Windows, an object can also be a 64-bit type, and it must be aligned with 64 bits.
It must be declared volatile.

If they are correct, then access to the object will be volatile, atomic and surrounded by instructions to ensure cache consistency. Size and alignment conditions must be met so that the compiler does the code that performs atomic operations when accessing the object. Declaring a volatile object ensures that the compiler does not optimize the code associated with caching previous values that it may have read into the register, and ensures that the generated code contains appropriate instructions for protecting memory upon access.

However, you are probably even better off using something like the Interlocked * functions to access the little things, as well as for standard synchronization objects such as Mutexes or CriticalSections, for large objects and data structures. Ideally, get libraries for and use data structures that already include matching locks. Let your libraries and OS work as much as possible!

In your example, I expect that you need to use thread-safe access to the val64 update, whether the thread will be started or not.

If the thread is already running, you will definitely need some sort of thread safe entry in val64, either using InterchangeExchange64, or by acquiring and releasing some kind of synchronization object that will follow the appropriate memory protection instructions. Similarly, a thread will need to use a thread-safe accessory to read it.

In the case where the flow has not yet been resumed, it is slightly less clear. It is possible that ResumeThread can use or act as a synchronization function and perform memory protection operations, but the documentation does not indicate that this is so, so it is better to assume that it is not.

Literature:

On the atomicity of 32- and 64-bit aligned types ... https://msdn.microsoft.com/en-us/library/windows/desktop/ms684122%28v=vs.85%29.aspx

In "volatile", including memory barriers ... https://msdn.microsoft.com/en-us/library/windows/desktop/ms686355%28v=vs.85%29.aspx

+2

Wuggy Jan 31 '15 at 11:36

source share

Hans passant · Accepted Answer · 2015-01-31T13:26:48+0000

he used to make sure that the value would be read / written as a whole

This is just a small part of atomicity. At its core, this means “uninterrupted”, an instruction on the processor whose side effects cannot alternate with another instruction. By design, a memory update is atomic when it can be performed using a single memory bus cycle. This requires that the address of the memory cell is aligned so that one cycle can update it. Uneven access requires additional work, part of the bytes written in one loop, and part of another. Now it is not uninterrupted.

Obtaining consistent updates is fairly simple; it is a guarantee provided by the compiler. Or, in a broader sense, the memory model implemented by the compiler. Which simply selects memory addresses that are aligned, sometimes deliberately leaving unused spaces of a few bytes to align the next variable. Updating a variable that is larger than the processor's native word size can never be atomic.

But more importantly, what processor instructions do you need to work with threads. Each processor implements a variant of the CAS instruction , compares and replaces. This is the kernel core instruction needed for synchronization. At the top of this basic command, all higher-level synchronization primitives are built, such as monitors (the same state variables), mutexes, signals, critical sections, and semaphores.

At a minimum, a processor usually provides additional ones to make simple atomic operations. Like incrementing a variable, it is based on an intermittent operation, since it requires a read-modify-write operation. The atomic need for it is very common, most other C ++ programs rely on it, for example, to perform reference counting.

volatility does not guarantee flow safety

This is not true. This is an attribute that dates from much easier times when machines had only one processor core. This only affects code generation, in particular how the code optimizer tries to eliminate memory access and use a copy of the value in the processor register instead. It makes a big, big difference in the speed of code execution, reading a value from a register is easy 3 times faster than reading it from memory.

The use of volatile ensures that the code optimizer will not consider the value in the register accurate and forces it to read the memory again. It really matters only for memory types that are not stable in themselves, devices that expose their registers through memory-mapped I / O modules. He abused heavily, as it is key to trying to put semantics on top of processors with a weak memory model, with Itanium being the most egregious example. What you get with volatile today is highly dependent on the particular compiler and the runtime you are using. Never use it to ensure thread safety; always use a synchronization primitive.

simply atomized / volatile is thread safe

Programming would be much easier if that were true. Atomic operations cover only the simplest operations; a real program often needs to maintain the integrity of the entire object. The presence of all its members is updated atomically and never exposes an idea of an object that is partially updated. Something as simple as iterating a list is a basic example; you cannot have another thread modifying the list while you look at its elements. This is when you need to achieve higher-level synchronization primitives, a type that can block code until it is safe.

Real programs often suffer from this need for synchronization and demonstrate Amdahls behavior. In other words, adding an extra thread does not actually speed up the program. Sometimes this makes it slower. Whoever finds the best mouse trap is guaranteed the Nobel, we are still waiting.

Windows Atomicity, Volatility, and Thread Safety

More articles: