Atomic reading / writing of the int value without additional operation on the int value itself

GCC offers a good set of built-in functions for atomic operations. And on MacOS or iOS, even Apple offers a good set of atomic features . However, all these functions perform an operation, for example. addition / subtraction, logical operation (AND / OR / XOR) or comparison and set / comparison and exchange. What I'm looking for is a way to atomize / read an int value, for example:

 int a; /* ... */ a = someVariable; 

It's all. a will be read by another thread, and it is only important that a has its old meaning or new value. Unfortunately, the C standard does not guarantee that assigning or reading a value is an atomic operation. I remember that once I read somewhere that writing or reading a value of a variable of type int guaranteed to be atomic in GCC (regardless of the size of int), but I searched everywhere on the GCC main page and I cannot find this statement ( it may have been deleted).

I cannot use sig_atomic_t because sig_atomic_t does not have a guaranteed size and can have a different size than int .

Since only one thread will ever “write” the value to a , while both threads will “read” the current value of a , I do not need to perform the operations themselves in atomic mode, for example

 /* thread 1 */ someVariable = atomicRead(a); /* Do something with someVariable, non-atomic, when done */ atomicWrite(a, someVariable); /* thread 2 */ someVariable = atomicRead(a); /* Do something with someVariable, but never write to a */ 

If both threads write to a , then all operations should be atomic, but, thus, this can only lead to cancellation of processor time; and in our project we are extremely low for CPU resources. Until now, we use the mutex around read / write operations a , and although the mutex is held for such a tiny period of time, this is already causing problems (one of the threads is a real-time thread and blocking on the mutex forces it to reject restrictions in real time, which is very bad).

Of course, I could use __sync_fetch_and_add to read the variable (and just add “0” to it so as not to change its value), and to write use __sync_val_compare_and_swap to write it (since I know its old value, so go through this to make sure that the value is always exchanged), but doesn’t it add extra overhead?

+6
source share
2 answers

A __sync_fetch_and_add with argument 0 is really the best option if you want your load to be atomic and act as a memory barrier. Similarly, you can use and with 0 or or with -1 to store 0 and -1 atomically with a memory barrier. For recording, you can use __sync_test_and_set (actually the xchg operation), if it is enough to “get” the barrier, or if you use Clang, you can use __sync_swap (which is the xchg operation with a full barrier).

However, in many cases this is redundant, and you may prefer to manually add memory barriers. If you don't need a memory barrier, you can use a mutable load to atomically read / write a variable that is aligned and no wider than a word:

 #define __sync_access(x) (*(volatile __typeof__(x) *) &(x)) 

(This macro is an lvalue, so you can also use it for a store, for example __sync_store(x) = 0 ). The function implements the same semantics as the C ++ 11 memory_order_consume , but only under two assumptions:

  • that your machine has consecutive caches; if not, you need a memory barrier or shared cache before loading (or until the first load group).

  • that your car is not DEC Alpha. Alpha had very relaxed semantics to reorder memory access, so you need a memory barrier on it after loading (and after each loading in the load group). In Alpha, a macro contains only memory_order_relaxed semantics. BTW, the first versions of Alpha could not even store bytes atomically (only a word that was 8 bytes).

Anyway __sync_fetch_and_add will work. As far as I know, no other machine imitated Alpha, so none of the assumptions should create problems on current computers.

+3
source

Unstable, aligned, dimensional reads / writes by size are atomic on most platforms. Verifying the build will be the best way to find out if this is true on your platform. Atomic registers cannot create almost as many interesting structures that wait, as more complex mechanisms, such as comparison and swap, are therefore included.

See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.5659&rank=3 for theory.

Regarding synch_fetch_and_add with argument 0 - This seems like the safest bet. If you are worried about performance, comment on the code and see if you meet your performance goals. You may be a victim of premature optimization.

+2
source

Source: https://habr.com/ru/post/895597/


All Articles