Built-in gcc spinlock

How can I be sure that the data written by several CPU cores during mutex lock is synchronized in all L1 caches of all cores? I'm not talking about a variable that represents a lock, I am talking about memory locations that are involved during a lock.

This is for Linux, x86_64, and my code is:

#include <sys/types.h> #include "dlog.h" uint *dlog_line; volatile int dlog_lock; char *dlog_get_new_line(void) { uint val; while(!__sync_bool_compare_and_swap(&dlog_lock, 0, 1)) { val=*dlog_line; if (val==DT_DLOG_MAX_LINES) val=0; *dlog_line=val; } dlog_lock = 0; } 

Here, inside the dlog_get_new_line () function, I use the built-in gcc function, so there should not be any problems with acquiring the lock. But how can I guarantee that when the lock is released, the value indicated by * dlog_line extends to the entire L1 cache of all other processor cores in the system?

I do not use pthreads, each process runs on a different cpu core.

+4
source share
2 answers

What interests you is called cache coherency . This is done automatically using hardware.

In short, you do not need to do anything if you use __sync_bool_compare_and_swap() (or any other internal lock) correctly.

As a simplified explanation, the thread will not return from the call until __sync_bool_compare_and_swap() until all other processors can see the new value or know that their local copy is out of date.


If you are interested in what happens beneath it (in hardware), there are various cache coherence algorithms that are used to ensure that the kernel does not read an outdated copy of the data.

Here is a partial list of commonly accepted protocols:

Modern equipment, as a rule, will have much more complex algorithms for it.

+3
source

There are two other built-in devices in Gcc that are precisely invented for the purpose of description: __sync_lock_test_and_set and __sync_lock_release . They have the so-called receive / release semantics, which ensures that the stored values ​​of other variables will be visible as needed while you hold your spin lock. These requirements are slightly weaker than __sync_bool_compare_and_swap provides, so it’s better to use tools specifically designed to work.

They must adapt well to the capacity of various equipment. For example, on my x86_64 this puts the mfence instruction to the final atom store in dlog_lock , but on other hardware it will be adapted to the available instruction set.

0
source

Source: https://habr.com/ru/post/1389114/


All Articles