Semaphore without destruction / inconsistent race condition

Question

Semaphore without destruction / inconsistent race condition

Note. I resolutely edited this question for clarity, making a mess of this brainstorming publicly. However, the algorithms described and the question of whether they are sufficient to avoid racing should be identical.

I am trying to implement semaphores that avoid the race condition described in glibc error 12674:

http://sourceware.org/bugzilla/show_bug.cgi?id=12674

Basically, an efficient way to write a futex-based semaphore, if you don't care about this race condition for annihilation, is:

Message:

The atomic value of the increment semaphore.
Check the number of waiters. If it is nonzero, execute futex wake.

Wait:

Atomic decrement is if-positive by the value of the semaphore (and return if it succeeds).
If this fails, increase the number of waiters and fulfill the futex expectations.
In wake mode, lowering waiters, cycle and retry.

Step 2 of the post operation is something that is unsafe.

There are two obvious implementations that avoid this problem, but both have serious problems:

The first solution is not to keep the waiter’s counter or flag in general and always do a futex wake in the mail. This is certainly safe, but the goal of futexes wins (keeping a useless case in user space).

The second solution is not to keep the waiter’s counter, but instead, so that the semaphore value is reduced to -1 in case of a wait conflict. Postoperative surgery goes from -1 to 1 and wakes up all the waiters. One of them succeeds in reducing the value of the semaphore to 0, and if it remains, they set the value to -1, so the next message will perform another wakeup. This solution is also obviously safe, but it leads to the fall of threads competing for the semaphore at publication.

Thus, the first solution works well only for semaphores, which are always approved, and the latter works well only for semaphores, which usually have no more than one waiter. Also not suitable for general use.

Now, to try and find a real solution ...

At this stage, it should be noted that there are several other requirements that complicate implementation in the real world. The post operation must be safe using an asynchronous signal (therefore, basically it cannot use locks), and the wait operation is required to provide an interrupt on signals, a timeout, or a thread cancellation. In practice, this means that the post-operation should be able to safely “undo” any changes that it makes to the semaphore state. I hush up such problems because I don't seem to have problems with them, but they make some otherwise obvious changes / solutions impossible, so anyone offering new approaches as an answer should be aware of these problems ...

I have a proposed solution, but I'm not sure that it is subject to new (and possibly worse) race conditions. I will describe it here, and I hope that some gods (or demigods, at least) can have kindness to reconsider it for correctness.

My approach is a bit of a hybrid of the second "bad" solution described above and the original approach (from the race described in the glibc bug report). It uses both the waiter’s counter and the waiter’s flag (-1, stored in the semaphore value).

The key change to the wait operation is that it stores -1 instead of 0 in the semaphore value whenever there are waiters (a pre-existing waiter counter or by itself as a new waiter).

Wait:

Read the value of the semaphore.
If the value is positive, define the new semaphore value as follows: if the value is exactly 1 and there are waiters, the new value should be -1; otherwise just decrease the old value. Use the compare and replace command to update the value and return success if it succeeds.
Otherwise, incrementing the waiters atomically replaces the value 0 with -1 and waits for futex with -1 as the value.
In wake mode, lowering waiters, cycle and retry.

The key change to the post operation is that it reads at the waiter until , increasing the value of the semaphore, and not later:

Message:

Reading and saving semaphore values.
Reading and keeping the number of waiters.
Define a new semaphore value (-1 becomes 1, otherwise just increase).
Atomic compare-and-swap to update semaphore value. On failure, goto 1.
If the number of waiters saved was nonzero or the semaphore value was -1, do a futex wake.

The comparison and replacement in step 4 provides some security that the number of waiters is still correct, but there is still an ABA run - between steps 1 and 4, it is possible that other threads perform wait and send operations that leave the semaphore, the value matches its initial value.

When searching for cases where this algorithm may not call waiters, we need to consider only cases when the initial number of waiter counters is 0 and the semaphore value is not -1. In addition, if the semaphore value is positive and pre-existing waiters no longer exist, the current post is not responsible for any awakenings, therefore this case is also not interesting. We leave the consideration of cases when the wait operation begins with a zero semaphore and a zero count of expectations. In this situation, in order to avoid the conditions of the race, any event that occurs between steps 2 and 4, which leads to the emergence of new waiters, must change the value of the semaphore so that the failure and change in step 4 fail. Obviously, any single intermediate record or expectation will change the semaphore value (to 1 or -1, respectively), so a more specific case is a sequence of operations that lead to a semaphore value of 0, but the presence of waiters.

I believe that this cannot happen because of the procedure performed in the wait operation, but I was not 100% convinced.

Finally, here are a few examples of races that happen if you weaken my algorithm to establish the motivation for what it does, if that is not clear.

Failure 1: using a pure count of expectations, flag -1 in the semaphore value. The trivial race is as follows:

The semaphore value starts at 0
Topic 1 starts publishing, reads 0 semaphore value and 0 standby counter.
Thread 2 starts wait, increments the wait counter, and waits for futex.
Thread 1 performs successful comparisons and replacements, returns without awakening the waiter.

Failure 2: using the waiters counter and having new waiters sets the semaphore value to -1, but simply decreases the semaphore value when the wait is executed (without setting it to -1 if other threads are still waiting):

The semaphore value starts at 0
Topic 1 starts publishing, reads 0 semaphore value and 0 standby counter.
Themes 2 and 3 are waiting, increasing the number of expectations and pending futex.
Insert 4 messages by setting the semaphore value to 1.
Thread 2 wakes up and reduces the value of the semaphore to 0, the number of waiters is 1.
Thread 1 performs successful comparisons and replacements, returns without waking thread 3.

+6

c linux pthreads race-condition semaphore

R .. Aug 2 '11 at 4:43

source share

1 answer

bdonlan · Answer 1 · 2011-08-02T18:12:03+0000

First of all, let me give you two alternative approaches that you might want to consider.

Approach No. 1 (X86-specific, fast): CMPXCHG8B / CMPXCHG16B.
Platforms
x86 have an atomic comparison and double-arrow exchange operation. In the 32-bit case, this is 8 bytes; on a 64-bit CMPXCHG16B that atomically compares and exchanges the full 16 bytes of data. Using this, you can atomically change the wait counter and the semaphore count in one operation. futex can only wait in one pointer field, but in this case this should not be a serious problem.
Approach No. 2 (portable, limited): the number of packaged.
If the limit of 2 ^ 16 for the calculation of the waiter and semaphore is acceptable, just pack both accounts in one 32-bit field.
Approach No. 3 (portable, has some overhead): use semaphore-in-semaphore to protect the post-race.
Reserve 8 bits of a semaphore counter to block over post operations. The post operation will increment this counter (blocking if it overflows), while increasing the counter of true semaphores. Then he will do his job with the waiters field, and then sharply decrease the lock counter. After decrement, if the previous value was 255, wake up all the waiters (although this causes a thunder herd, it should be extremely rare).
Upon removal, acquire a lock 255 times (you can increase it by more than one step), blocking if necessary. After the lock has been acquired 255 times, you know that all messages are completed, and it is safe to remove the lock.
Downside: Messages now require two atomic exchanges, and the maximum number of semaphores is 2 ^ 24-1. In addition, the recursive input 255 of the asynchronous signal handler will be inhibited.

These approaches are simpler, easier to prove, and most likely faster. However, their limitations may mean that they are unacceptable for your case (the CMPXCHG8B approach should work well on x86, however). And one more thing:

Approach No. 4 (independence from arbitrary shape, complex, fast): core change
One option is to change the kernel to use a low-cost and safe method to read the waiter’s field without causing segfault if the memory becomes free. For example, you can add a system column that registers a stream-dependent data structure; on this stream-dependent data page, you may have an "error handler transition address". In case the segfaults program, if the jump address is nonzero, the kernel simply jumps there instead of raising SIGSEGV. In addition, you may have a similar mechanism to simply suppress violating instructions.
Now all you have to do is:
- When starting libc init and thread, register these thread-dependent structures and store a pointer to them in the TLS data.
- In the message, arrange for errors that should be suppressed around the waiter’s account. If a malfunction occurs, do not wake up (wake up is harmless if the memory in question is reused for another purpose).
And there you go - you get a quick algorithm, but you also get protection from race with deletion. But you have to crack the kernel segv handlers to do this. It might be worth a look at SEH on Windows; a similar mechanism will work very well.

In any actor, I do not see anything wrong with your approach, but I may be missing something. It might be nice to pick it up on the appropriate mailing lists and consult with futex maintainers; they are likely to be interested in implementing kernel support to make this easier for you.

Semaphore without destruction / inconsistent race condition

More articles: