How does the x86 pause instruction work in spinlock * and *, can it be used in other scripts?

The pause instruction is usually used in a spinlock testing loop , when some other thread holds a spinlock to soften the compressed loop. He said this is equivalent to some NOP instructions. Can someone tell me exactly how it works to optimize spinlock? It seems to me that even NOP instructions are a waste of processor time. Will they reduce CPU usage?

Another question: can I use the pause instruction for other similar purposes. For example, I have a busy thread that continues to scan some places (for example, queues) to retrieve new nodes; however, sometimes the queue is empty, and the thread justifies wasting processor time. Sleeping a thread and waking it up with other threads may be an option, but the thread is critical, so I do not want it to sleep. Can pause work with instructions to reduce CPU usage? Does it currently use a 100% physical core processor?

Thank.

+29
multithreading x86 spinlock
Jan 18 '11 at 15:09
source share
4 answers

PAUSE notifies the CPU that it is a spinblock wait cycle so that access to memory and cache can be optimized. See also the pause instruction in x86 for more information on avoiding erroneous memory speculation when exiting a rotation cycle.

PAUSE can actually stop the processor for some time in order to save energy. Older processors decode it as REP NOP, so you don't need to check if it is supported. Older processors simply do nothing (NOP) as quickly as possible.

See also https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops




Update: I do not think that it is recommended to use PAUSE in queue checking if you do not make the queue a spin lock (and there is no obvious way to do this).

Spinning for a very long time is still very bad, even with PAUSE.

+20
Jan 18 2018-11-18T00:
source share

The processor undergoes a serious decrease in performance when exiting the loop, because it detects a possible violation of the memory order. The PAUSE statement tells the processor that the code sequence is a wait cycle. the processor uses this hint in order to avoid disturbing the memory order in most situations, which greatly improves processor performance. For this reason, it is recommended that the PAUSE statement should be placed in all wait wait cycles. An additional feature of the PAUSE instruction is to reduce the power consumed by Intel processors.

[source: Intel manual]

+12
09 Oct '11 at 11:29
source share

The PAUSE instruction is also used in processors with hyperthreads to reduce the impact of performance on other hyperthreads, possibly by abandoning them for more processor time.

The Intel article below describes this, and it is not surprising to avoid waiting cycles on such processors: https://software.intel.com/en-us/articles/long-duration-spin-wait-loops-on-hyper-threading -technology-enabled-intel-processors

+1
Oct. 15 '14 at 18:50
source share

Intel recommends only using PAUSE instructions when the rotation cycle is very short.

As I understood from your questions, the expectations in your case are very long. In this case, spin loops are not recommended.

You wrote that you have a "stream that continues to scan some places (for example, a queue) to retrieve new nodes."

In this case, Intel recommends using the synchronization API functions for your operating system. For example, you can create an event when a new node appears in the queue, and just wait for this event using WaitForSingleObject(Handle, INFINITE) . The queue will raise this event whenever a new node appears.

According to the Intel Optimization Guide, the PAUSE command is typically used with software threads running on two logical processors located in the same processor core, waiting for the lock to be released. Such short wait cycles tend to last between dozens and several hundred cycles (i.e. 20-500 CPU cycles), so from a performance standpoint it is more useful to wait while taking up the processor than giving way to the OS.

500 cycles of the processor on the Core i7 7700K processor with a clock frequency of 4500 MHz is 0.0000001 seconds, that is 1/10000000 seconds of a second: the processor can be 10 million times per second in this cycle of 500 processor cycles.

As you can see, this PAUSE instruction is designed for very short periods of time.

On the other hand, every call to an API function, such as Sleep (), experiences the expensive cost of a context switch, which can be 10,000+ cycles; it also suffers from the cost of a ring of 3 to 0 transitions, which can be 1000+ cycles.

If the number of threads is greater, then the processor cores (multiplied by the hyper-thread function, if any) are available, and the thread will switch to another in the middle of the critical section, waiting for the critical section from another thread, really do looong for at least 10000+ cycles, so the command PAUSE will be useless.

For more information, see the following articles:

When it is expected that the wait cycle will last thousands of cycles or more, it is preferable to switch to the operating system by calling one of the functions of the OS synchronization API, such as WaitForSingleObject in Windows.

As a conclusion: in your scenario, the PAUSE command will not be the best choice, since your waiting time is long, and PAUSE designed for very short loops. PAUSE - A total of 131 cycles of SkyWell or later processors. For example, it's simple or 31.19ns on an Intel Core i7-7700K @ 4.20GHz Lake Kaby processor.

On earlier processors like Haswell, I have about 9 cycles. These are 2.81ns on Intel Core i5-4430 @ 3GHz. Thus, for long cycles, it is better to abandon the control of other threads using the functions of the OS synchronization API than to occupy the CPU using the PAUSE cycle.

0
Jul 05 '17 at 4:22
source share



All Articles