Note. I resolutely edited this question for clarity, making a mess of this brainstorming publicly. However, the algorithms described and the question of whether they are sufficient to avoid racing should be identical.
I am trying to implement semaphores that avoid the race condition described in glibc error 12674:
http://sourceware.org/bugzilla/show_bug.cgi?id=12674
Basically, an efficient way to write a futex-based semaphore, if you don't care about this race condition for annihilation, is:
Message:
- The atomic value of the increment semaphore.
- Check the number of waiters. If it is nonzero, execute futex wake.
Wait:
- Atomic decrement is if-positive by the value of the semaphore (and return if it succeeds).
- If this fails, increase the number of waiters and fulfill the futex expectations.
- In wake mode, lowering waiters, cycle and retry.
Step 2 of the post operation is something that is unsafe.
There are two obvious implementations that avoid this problem, but both have serious problems:
The first solution is not to keep the waiter’s counter or flag in general and always do a futex wake in the mail. This is certainly safe, but the goal of futexes wins (keeping a useless case in user space).
The second solution is not to keep the waiter’s counter, but instead, so that the semaphore value is reduced to -1 in case of a wait conflict. Postoperative surgery goes from -1 to 1 and wakes up all the waiters. One of them succeeds in reducing the value of the semaphore to 0, and if it remains, they set the value to -1, so the next message will perform another wakeup. This solution is also obviously safe, but it leads to the fall of threads competing for the semaphore at publication.
Thus, the first solution works well only for semaphores, which are always approved, and the latter works well only for semaphores, which usually have no more than one waiter. Also not suitable for general use.
Now, to try and find a real solution ...
At this stage, it should be noted that there are several other requirements that complicate implementation in the real world. The post operation must be safe using an asynchronous signal (therefore, basically it cannot use locks), and the wait operation is required to provide an interrupt on signals, a timeout, or a thread cancellation. In practice, this means that the post-operation should be able to safely “undo” any changes that it makes to the semaphore state. I hush up such problems because I don't seem to have problems with them, but they make some otherwise obvious changes / solutions impossible, so anyone offering new approaches as an answer should be aware of these problems ...
I have a proposed solution, but I'm not sure that it is subject to new (and possibly worse) race conditions. I will describe it here, and I hope that some gods (or demigods, at least) can have kindness to reconsider it for correctness.
My approach is a bit of a hybrid of the second "bad" solution described above and the original approach (from the race described in the glibc bug report). It uses both the waiter’s counter and the waiter’s flag (-1, stored in the semaphore value).
The key change to the wait operation is that it stores -1 instead of 0 in the semaphore value whenever there are waiters (a pre-existing waiter counter or by itself as a new waiter).
Wait:
- Read the value of the semaphore.
- If the value is positive, define the new semaphore value as follows: if the value is exactly 1 and there are waiters, the new value should be -1; otherwise just decrease the old value. Use the compare and replace command to update the value and return success if it succeeds.
- Otherwise, incrementing the waiters atomically replaces the value 0 with -1 and waits for futex with -1 as the value.
- In wake mode, lowering waiters, cycle and retry.
The key change to the post operation is that it reads at the waiter until , increasing the value of the semaphore, and not later:
Message:
- Reading and saving semaphore values.
- Reading and keeping the number of waiters.
- Define a new semaphore value (-1 becomes 1, otherwise just increase).
- Atomic compare-and-swap to update semaphore value. On failure, goto 1.
- If the number of waiters saved was nonzero or the semaphore value was -1, do a futex wake.
The comparison and replacement in step 4 provides some security that the number of waiters is still correct, but there is still an ABA run - between steps 1 and 4, it is possible that other threads perform wait and send operations that leave the semaphore, the value matches its initial value.
When searching for cases where this algorithm may not call waiters, we need to consider only cases when the initial number of waiter counters is 0 and the semaphore value is not -1. In addition, if the semaphore value is positive and pre-existing waiters no longer exist, the current post is not responsible for any awakenings, therefore this case is also not interesting. We leave the consideration of cases when the wait operation begins with a zero semaphore and a zero count of expectations. In this situation, in order to avoid the conditions of the race, any event that occurs between steps 2 and 4, which leads to the emergence of new waiters, must change the value of the semaphore so that the failure and change in step 4 fail. Obviously, any single intermediate record or expectation will change the semaphore value (to 1 or -1, respectively), so a more specific case is a sequence of operations that lead to a semaphore value of 0, but the presence of waiters.
I believe that this cannot happen because of the procedure performed in the wait operation, but I was not 100% convinced.
Finally, here are a few examples of races that happen if you weaken my algorithm to establish the motivation for what it does, if that is not clear.
Failure 1: using a pure count of expectations, flag -1 in the semaphore value. The trivial race is as follows:
- The semaphore value starts at 0
- Topic 1 starts publishing, reads 0 semaphore value and 0 standby counter.
- Thread 2 starts wait, increments the wait counter, and waits for futex.
- Thread 1 performs successful comparisons and replacements, returns without awakening the waiter.
Failure 2: using the waiters counter and having new waiters sets the semaphore value to -1, but simply decreases the semaphore value when the wait is executed (without setting it to -1 if other threads are still waiting):
- The semaphore value starts at 0
- Topic 1 starts publishing, reads 0 semaphore value and 0 standby counter.
- Themes 2 and 3 are waiting, increasing the number of expectations and pending futex.
- Insert 4 messages by setting the semaphore value to 1.
- Thread 2 wakes up and reduces the value of the semaphore to 0, the number of waiters is 1.
- Thread 1 performs successful comparisons and replacements, returns without waking thread 3.