Intel Concurrent Queue thread pooling blocks: using pop () over pop_if_present ()

What is the difference in using a pop() blocking call compared to

 while(pop_if_present(...)) 

What should be preferable to another? And why?

I am looking for a deeper understanding of the trade-off between questioning myself, as in the case of while(pop_if_present(...)) , to let the system do it for you. This is a pretty general topic. For example, using boost::asio I could do myIO.run() , which blocks or does the following:

 while(1) { myIO.poll() } 

One possible explanation is that the thread that calls while(pop_if_present(...)) will remain busy, so this is bad. But someone or something should poll the asynchronous event. Why and how can it be cheaper when it is delegated to the OS or library? Is it because an operating system or library that is convenient for polling, for example, performs exponential deferral?

+4
source share
2 answers

Intel TBB library is open source, so I looked ...

It appears that pop_if_present() essentially checks to see if the queue is empty and returns immediately if there is one. If not, it tries to get the item at the top of the queue (which may fail as another thread may have come and picked it up). If it misses, it pauses atomic_backoff before re-checking. atomic_backoff will just rotate the first few times when it called (doubling the number of spin cycle cycles each time), but after a certain number of pauses, it simply gives way to the OS scheduler instead of spinning on the assumption that, as it waited while it could would also make it beautiful.

For the simple pop() function, if there is nothing in the queue, the atomic_backoff command waits until something appears in the queue.

Notice that there are at least 2 interesting things (for me anyway):

  • the pop() function waits for rotation (to the point) for something that should be displayed in the queue; he is not going to give in to the OS if he should not wait more than a little short moment. So, as you would expect, there is no reason to call pop_if_present() if you have something else that you are going to do between calls to pop_if_present()

  • when pop() inferior to the OS, it does this by simply giving up time. It does not block the thread on the synchronization object, which can be signaled when the element is placed in the queue - it seems to go into the sleep / polling cycle to check the queue for what the pop needs. This surprised me a little.

Take this analysis with salt. The source I used for this analysis may be a little old (this is actually from concurrent_queue_v2.h and .cpp), because the later concurrent_queue has a different API - there is no pop() or pop_if_present() , just a try_pop() function try_pop() in the last interface of class concurrent_queue . The old interface has been moved (possibly slightly modified) to the concurrent_bounded_queue class. It seems that newer concurrent_queues parameters can be configured when the library is built to use OS synchronization objects, rather than busy waiting and polling.

+3
source

With while(pop_if_present(...)) you do brute force waiting for a wait (also called spinning) on ​​the queue. When the queue is empty, you cycle the waste by keeping the CPU busy until the item is queued by another thread running on another processor or OS, deciding to give your processor some other, possibly unrelated thread / process.

You can see how this can be bad if you have only one processor - the manufacturer’s thread will not be able to press and, thus, stop the consumer’s rotation, until at least the consumer’s end is time sliced plus the overhead <context switch. Clear mistake.

With multiple CPUs, this can be better if the OS selects (or forces) the manufacturers flow to work on different CPUs. This is the main idea of spin-lock - a synchronization primitive built directly on special processor instructions, such as compare and-swap or related to loading / saving, and usually used inside the operating system to exchange data between interrupt handlers and the rest of the kernel and to build constructions more high level such as semaphores .

With pop() blocking, if the queue is empty, you enter sleep wait , that is, you request the OS to put the consumer thread in a state without scheduling, until the event clicks on the queue, another thread forms. The key point here is that the processor is available for other (hopefully useful) jobs. The TBB implementation is actually trying to avoid sleep, as it is expensive (entering the kernel, rescheduling, etc.). The goal is to optimize the normal case when the queue is not empty, and the element can be quickly found.

The choice is really simple, but always - expectations of expectation, i.e. pop() blocking, unless you have to wait in standby mode (and this is in real-time systems, the context of OS interruption, and some very specialized applications).

Hope this helps a bit.

+2
source

Source: https://habr.com/ru/post/1301651/


All Articles