If you have multiple threads waiting on the same resource (in this case, semaphores and a queue), you create a bottleneck. You set all tasks in one queue, even if you have several employees. Logically, this may make sense if workers are usually idle, but the whole point of the thread pool is to deal with a heavily loaded scenario where workers remain busy (for maximum end-to-end input). Using one input queue will be especially bad in a multiprocessor system where all workers read and write the queue head when they try to get the next task. Although the lock conflict may be low, the queue head pointer will still need to be exchanged / transferred from one processor cache to another each time it is updated.
Think about the ideal case: all workers are always busy. When a new task is in the queue, you want it to be sent to the employee who will perform the current / pending tasks first.
If you had a non-partisan oracle as a client who could tell you which worker should queue a new task, and each employee had his own turn, then you could implement each employee with his own multitasking scenario, a queue with one reader and always send new tasks in the best turn, thereby eliminating worker conflicts in one common input queue. Of course, you don’t have such an oracle, but this mechanism still works very well until the employee has completed the tasks or the queues are balanced. “Job theft” refers to these cases, but at the same time reduces competition compared to a single-queue situation.
See also: Is Stealing always the most suitable user-level thread scheduling algorithm?
source share