How to have atomic integers on machines that lack stdatomic.h?

I developed a multi-threaded program that depends on the availability of atom_int, atomic_store and atomic_load from stdatomic.h. The program is compiled using GCC.

Now I tried to unsuccessfully compile the program on several older versions of the operating system that lack stdatomic.h. Unfortunately, this is a requirement that I can compile the program on older machines. Therefore, it is not enough to compile the program on the new version of the operating system and run the binary file on the old version.

Is there a way to emulate stdatomic.h on older machines, perhaps with some built-in GCC function?

When installing a newer version of GCC on the old operating system, there may be a solution, the current build system has strict requirements for "gcc", and the new GCC must be compiled from the source as the old systems do not have it in the package management system. So, ideally, the answer would be what works on older versions of GCC.

+6
source share
2 answers

Although this is not a complete solution for all applications, I found a way that supports the necessary basic functionality and passes at least some rudimentary multi-threaded tests:

#define _Atomic(T) struct { volatile __typeof__(T) __val; } typedef _Atomic(int) atomic_int; #define atomic_load(object) \ __sync_fetch_and_add(&(object)->__val, 0) #define atomic_store(object, desired) do { \ __sync_synchronize(); \ (object)->__val = (desired); \ __sync_synchronize(); \ } while (0) 

Calls __sync_synchronize and __sync_fetch_and_add are required, otherwise the connection between the threads fails (I did not test the removal of only one of them, I just tested the removal of both).

I am not very sure, however, that this solution works in all cases. I found it from https://gist.github.com/nhatminhle/5181506 , where the author does not recommend it for older versions of GCC.

In theory, you can also use a mutex. However, mutexes have lower performance than atomistic.

Edit:

It is also possible to implement atomic_store as follows:

 #define atomic_store(object, desired) do { \ for (;;) \ { \ __typeof__((object)->__val) oldval = atomic_load(object); \ if (__sync_bool_compare_and_swap(&(object)->__val, oldval, desired)) \ { \ break; \ } \ } \ } while (0) 

However, this led to a decrease in productivity from 139280.5 op / sec (standard deviation 1799.6 op / sec) to 131805.6 op / sec (standard deviation 986.03 op / sec). Thus, performance degradation is statistically significant.

Edit 2:

The loop approach has the following build code:

 .globl signal_completion .type signal_completion, @function signal_completion: .LFB18: leaq 4(%rdi), %rcx .L42: xorl %eax, %eax lock xaddl %eax, (%rcx) movl $1, %edx movl %eax, -4(%rsp) movl -4(%rsp), %eax lock cmpxchgl %edx, (%rcx) jne .L42 rep ; ret .LFE18: .size signal_completion, .-signal_completion .p2align 4,,15 

While the __sync_synchronize method has the following code:

 .globl signal_completion .type signal_completion, @function signal_completion: .LFB18: movl $1, 4(%rdi) ret .LFE18: .size signal_completion, .-signal_completion .p2align 4,,15 

... and on a machine that has stdatomic.h, it compiles:

  .globl signal_completion .type signal_completion, @function signal_completion: .LFB43: .cfi_startproc movl $1, 4(%rdi) mfence ret .cfi_endproc .LFE43: .size signal_completion, .-signal_completion 

So the only thing I am missing is mfence. I assume that it can be added using a simple built-in assembly, for example:

 asm volatile ("mfence" ::: "memory"); 

... is placed after the second __sync_synchronize () in the definition of atom_store.

Edit 3:

Apparently, __sync_fetch_and_add is not optimized since the loop that polled the variable has this assembly output:

 .L29: xorl %eax, %eax lock xaddl %eax, (%rdi) testl %eax, %eax je .L29 

Instead of this:

 #define atomic_load(object) ((object)->__val) 

You'll get:

 .L29: movl (%rdi), %eax testl %eax, %eax je .L29 

which is equivalent to building on a machine supporting stdatomic.h:

 .L38: movl (%rdi), %eax testl %eax, %eax je .L38 

Oddly enough, the __sync_fetch_and_add option works faster on my computer and in my test, although it has more complex code. Strange world, right?

+3
source

It is best to deploy your own packaging. Use stdatomic if available, otherwise emulate actions using mutexes or instructions on the platform.

+1
source

Source: https://habr.com/ru/post/1015491/


All Articles