Assembly instructions for replacing the critical region of openmp

Question

Assembly instructions for replacing the critical region of openmp

I have an array of elements that are processed by openmp tasks. It is possible that the task may add new elements at the end of the array. Of course, these items must also be processed and new items may appear. I am currently using this code

int p; #pragma omp critical { p=l.n++; }

It just reserves space at the end of the array. Type l -

 struct list { int n; double *e; }

and p will be used as an index to store the new item. I was wondering if there is a way to perform this operation without using a critical area. Is there an assembly instruction that copies the value and then increments the original number atomically?

The code will be executed on the nehalem processor, no need to worry about old machines

+4

assembly openmp critical-section

Patrik Sep 11 '12 at 9:37

source share

3 answers

Yes, on x86 there are several possible options.

XADD r/m, r

This command atomically adds the second operand (r) to the first operand (r / m) and loads the second operand (r) with the original value of the first operand (r / m).

To use it, you will need to load the second operand with the increment volume (I guess 1, here), and the first operand should be the memory location of what will be incremented.

This instruction must precede the LOCK prefix (it will make it atomic).

The InterlockedAdd() function in Microsoft Visual C ++ does this, and AFAIR uses XADD if available (available since i80486).

Another way is to use a loop with the CMPXCHG ...

pseudo code:

 while (true) { int oldValue = ln; int newValue = oldValue + 1; if (CAS(&l.n, newValue, oldValue) == oldValue) break; }

CAS() , which stands for Compare And Swap (a generic term in parallel programming), is a function that attempts to atomically replace a value in memory with a new value. Replacement succeeds when the value being replaced is equal to the last parameter, oldValue . This is not true. CAS returns the original value from memory, which allows us to find out if the replacement was successful (we compare the return value with oldValue ). An error (the return value is different from oldValue ) means that between reading oldValue and the moment we tried to replace it with newValue , another thread changed the value in memory. In this case, we simply repeat the whole procedure.

The CMPXCHG instruction is an x86 CAS .

Microsoft Visual C ++ InterlockedCompareExchange() uses CMPXCHG to implement CAS .

If XADD not available, InterlockedAdd() is implemented using CAS / CMPXCHG / InterlockedCompareExchange() .

Some other CPUs may have different capabilities. Some of them allow atomic execution of several related instructions.

+1

Alexey Frunze Sep 11 '12 at 10:14

source share

This is really just an atomic increment that returns a result that looks like this:

 mov p, 1 ; p must be a register lock xadd [ln], p

And now you know. I see no reason to use this, although there are ways to do this without resorting to assembler.

0

harold Sep 11 '12 at 10:05

source share

Arjunshankar · Accepted Answer · 2012-09-11T09:48:02+0000

 #pragma omp atomic capture p = l.n++;

This should use atomic increment when capturing a value if the hardware supports it.

Read more about #pragma omp atomic in this question: openMP, atom vs critical?

And here is the Intel Documentation for #pragma omp atomic .

I tried to make a minimal example with gcc -fopenmp -m32 -O2 -S :

 int i, j; void foo (void) { #pragma omp atomic capture i = j++; }

What I got is the simple atomic “fetch and add” we want:

 movl $1, %eax # eax = 1 lock xaddl %eax, j # atomic {swap (eax,j); j = eax + j;} movl %eax, i # i = eax ret

Assembly instructions for replacing the critical region of openmp

More articles: