The following is a simplified C program that demonstrates the problem I am facing with a parallel stack implemented using GNU built-in to compare and share on an Intel processor. It took me a while to figure out what was going on, but now that I see it, I see that it is within the guarantees provided by atomic comparison and exchange.
When a node is unloaded from the stack, modified, and then pushed back onto the stack, the changed value can become the new header of the stack, corrupting it. The comments in test_get describe the order of events that trigger this.
Is there a way to reliably use the same node with the same stack more than once? This is an exaggerated example, but even the return of an unmodified node to gHead may occur by other nodes. The initial point of this data structure was the reuse of the same nodes.
typedef struct test_node { struct test_node *next; void *data; } *test_node_p; test_node_p gHead = NULL; unsigned gThreadsDone = 0; void test_put( test_node_p inMemory ) { test_node_p scratch; do { scratch = gHead; inMemory->next = scratch; } while ( !__sync_bool_compare_and_swap( &gHead , scratch , inMemory ) ); } test_node_p test_get( void ) { test_node_p result; test_node_p scratch; do { result = gHead; if ( NULL == result ) break; // other thread acquires result, modifies next scratch = result->next; // scratch is 0xDEFACED... // other thread returns result to gHead } while ( !__sync_bool_compare_and_swap( &gHead , result , scratch ) ); // this thread corrupts gHead with 0xDEFACED... value if ( NULL == result ) { result = (test_node_p)malloc( sizeof(struct test_node) ); } return result; } void *memory_entry( void *in ) { test_node_p node; int index , count = 1000; for ( index = 0 ; index < count ; ++index ) { node = test_get(); *(uint64_t *)(node) |= 0xDEFACED000000000ULL; test_put( node ); } __sync_add_and_fetch(&gThreadsDone,1); return NULL; } void main() { unsigned index , count = 8; pthread_t thread; gThreadsDone = 0; for ( index = 0 ; index < count ; ++index ) { pthread_create( &thread , NULL , memory_entry , NULL ); pthread_detach( thread ); } while ( __sync_add_and_fetch(&gThreadsDone,0) < count ) {} }
I run this test with 8 logical cores. When I use only 4 threads, the problem rarely arises, but with 8 it is easy to reproduce. I have a MacBook with Intel Core i7.
I am not interested in using some kind of library or framework that solved this, I want to know how it was solved. Thanks.
Edit:
Here are two solutions that do not work in my case.
Some architectures contain pairs of ll / sc commands that perform atomic tests at an address instead of a value. Any write to an address, even of the same value, prevents success.
Some architectures provide more than pointer size comparison and swap. In this case, a monotonous counter is connected to a pointer, which each time atomically increases with each use, so the value always changes. Some Intel chips support this, but there is no GNU shell.
Here is the game from the game. Two threads, A and B, reach a point in test_get , where they have the same value for result , and it is not NULL . Then the following sequence occurs:
- Thread A passes comparison and swap and returns
result from test_get - Topic A modifies the contents of
result - Thread B
result interaction, getting any thread A placed there - Thread A ends with
result and calls test_put - Thread A passes comparison and swap to
test_put , returning the result to gHead - Thread B reaches comparison and swap in
test_get and passes - Thread B is now corrupted by
gHead with a value from Thread A
Here is a similar scenario with three threads, which does not require Thread A to change the result .
- Thread A passes comparison and swap and returns
result from test_get - Topic A does not change the contents of
result - Thread B
result interaction, getting the right value in scratch - Topic C calls
test_put with an unrelated value and succeeds - Thread A calls
test_put and will be able to return result to gHead - Thread B reaches comparison and swap in
test_get and passes - Thread B is now damaged by
gHead by leaking any added C thread
In any case, the problem is that thread A compares and swaps twice, once to get then again for put, before thread B reaches comparison and swap for get. Any value of thread B for zero should be discarded, but not because the value in gHead looks right.
Any solution that makes this less likely, without even preventing it, just makes it difficult to track the error.
Note that the scratch variable is just the name for wherever the dereferenced result value is placed before the start of the atomic instruction. Deleting a name removes the time slice between dereferencing and comparison that may be interrupted.
We also note that atomic means that it succeeds or fails in general. Any consistent pointer reading is implicitly atomic on the equipment in question. As for other threads, there is no broken point at which only half of the pointer has been read.