How to prevent corruption in a parallel stack implemented using atomic comparison and swap

The following is a simplified C program that demonstrates the problem I am facing with a parallel stack implemented using GNU built-in to compare and share on an Intel processor. It took me a while to figure out what was going on, but now that I see it, I see that it is within the guarantees provided by atomic comparison and exchange.

When a node is unloaded from the stack, modified, and then pushed back onto the stack, the changed value can become the new header of the stack, corrupting it. The comments in test_get describe the order of events that trigger this.

Is there a way to reliably use the same node with the same stack more than once? This is an exaggerated example, but even the return of an unmodified node to gHead may occur by other nodes. The initial point of this data structure was the reuse of the same nodes.

typedef struct test_node { struct test_node *next; void *data; } *test_node_p; test_node_p gHead = NULL; unsigned gThreadsDone = 0; void test_put( test_node_p inMemory ) { test_node_p scratch; do { scratch = gHead; inMemory->next = scratch; } while ( !__sync_bool_compare_and_swap( &gHead , scratch , inMemory ) ); } test_node_p test_get( void ) { test_node_p result; test_node_p scratch; do { result = gHead; if ( NULL == result ) break; // other thread acquires result, modifies next scratch = result->next; // scratch is 0xDEFACED... // other thread returns result to gHead } while ( !__sync_bool_compare_and_swap( &gHead , result , scratch ) ); // this thread corrupts gHead with 0xDEFACED... value if ( NULL == result ) { result = (test_node_p)malloc( sizeof(struct test_node) ); } return result; } void *memory_entry( void *in ) { test_node_p node; int index , count = 1000; for ( index = 0 ; index < count ; ++index ) { node = test_get(); *(uint64_t *)(node) |= 0xDEFACED000000000ULL; test_put( node ); } __sync_add_and_fetch(&gThreadsDone,1); return NULL; } void main() { unsigned index , count = 8; pthread_t thread; gThreadsDone = 0; for ( index = 0 ; index < count ; ++index ) { pthread_create( &thread , NULL , memory_entry , NULL ); pthread_detach( thread ); } while ( __sync_add_and_fetch(&gThreadsDone,0) < count ) {} } 

I run this test with 8 logical cores. When I use only 4 threads, the problem rarely arises, but with 8 it is easy to reproduce. I have a MacBook with Intel Core i7.

I am not interested in using some kind of library or framework that solved this, I want to know how it was solved. Thanks.

Edit:

Here are two solutions that do not work in my case.

Some architectures contain pairs of ll / sc commands that perform atomic tests at an address instead of a value. Any write to an address, even of the same value, prevents success.

Some architectures provide more than pointer size comparison and swap. In this case, a monotonous counter is connected to a pointer, which each time atomically increases with each use, so the value always changes. Some Intel chips support this, but there is no GNU shell.

Here is the game from the game. Two threads, A and B, reach a point in test_get , where they have the same value for result , and it is not NULL . Then the following sequence occurs:

  • Thread A passes comparison and swap and returns result from test_get
  • Topic A modifies the contents of result
  • Thread B result interaction, getting any thread A placed there
  • Thread A ends with result and calls test_put
  • Thread A passes comparison and swap to test_put , returning the result to gHead
  • Thread B reaches comparison and swap in test_get and passes
  • Thread B is now corrupted by gHead with a value from Thread A

Here is a similar scenario with three threads, which does not require Thread A to change the result .

  • Thread A passes comparison and swap and returns result from test_get
  • Topic A does not change the contents of result
  • Thread B result interaction, getting the right value in scratch
  • Topic C calls test_put with an unrelated value and succeeds
  • Thread A calls test_put and will be able to return result to gHead
  • Thread B reaches comparison and swap in test_get and passes
  • Thread B is now damaged by gHead by leaking any added C thread

In any case, the problem is that thread A compares and swaps twice, once to get then again for put, before thread B reaches comparison and swap for get. Any value of thread B for zero should be discarded, but not because the value in gHead looks right.

Any solution that makes this less likely, without even preventing it, just makes it difficult to track the error.

Note that the scratch variable is just the name for wherever the dereferenced result value is placed before the start of the atomic instruction. Deleting a name removes the time slice between dereferencing and comparison that may be interrupted.

We also note that atomic means that it succeeds or fails in general. Any consistent pointer reading is implicitly atomic on the equipment in question. As for other threads, there is no broken point at which only half of the pointer has been read.

+4
source share
3 answers

(I discard my previous answer.)

The problem is that you do not have a mechanism for automatically reading gHead and gHead->next , but this is necessary to achieve a stack that is free from blocking. Since you still intend to tackle the loop in order to deal with comparison and swap collisions, you can use the spin lock equivalent:

 void lock_get () { while (!_sync_bool_compare_and_swap(&gGetLock, 0, 1)) {} } void unlock_get () { unlock_get_success = _sync_bool_compare_and_swap(&gGetLock, 1, 0); assert(unlock_get_success); } 

Now the loop in test_get() can be surrounded by lock_get() and unlock_get() . The CAS test_get() is just one thread fighting test_put() . However, the Jens implementation of the CAS cycle seems cleaner.

 lock_get(); result = __sync_val_compare_and_swap(&gHead, 0, 0); while ((oldval = result)) { result = __sync_val_compare_and_swap(&gHead, result, result->next); if (oldval == result) break; } unlock_get(); 

This fulfills the intention that only one thread should pop out of the head.

+2
source

Never access an atomic variable with a simple estimate. I think __sync_val_compare_and_swap more convenient for a comparison and replacement cycle like yours.

 /* read the head atomically */ result = __sync_val_compare_and_swap(&gHead, 0, 0); /* compare and swap until we succeed placing the next field */ while ((oldval = result)) { result = __sync_val_compare_and_swap(&gHead, result, result->next); if (oldval == result) break; } 
+4
source

If you have a CAS variable (in your case gHead). You should always use CAS to access it. Or protect it with a lock. For reading and writing. Things like "result = gHead"; big no no.

Repeating your question, LIFO is a stack. The implementation of any CAS-based data structure is based on changing only one thing. On the stack, which is the top of the stack. It seems you are making a linked list. I'm sure there are cool ways to make an atom-related list.

But for the stack, make one stack pointer, like everyone else :)

+1
source

Source: https://habr.com/ru/post/1487595/


All Articles