What is a CMOV that improves processor performance?

I understand when a branch is easy to predict that it is better to use the IF statement, because the branch is completely free. I found out that if a branch is not easy to predict, then CMOV is better. However, I do not quite understand how this can be achieved?

Of course, the problem area remains the same - we do not know the address of the next instruction to execute? Therefore, I don’t understand how all the way down the pipeline when CMOV is running, how could this help the collector (10 CPU cycles back in the past) to choose the right path and prevent pipeline failure?

Can someone please help me understand how CMOV improves branching?

+6
source share
2 answers

CMOV instructions do not direct the flow control path. These are instructions that are executed to calculate the result based on condition codes, that is, given instructions. Some architectures (like ARM) can predict many forms of instructions based on condition codes, but x86 can only execute "mov," that is, conditional move (CMOV). They are decoded and executed with a delay to determine the result of the instruction.

Branches, on the other hand, are forecasted and actually control the execution of instructions. The branch predictor “looks ahead” to the “collector” instruction, specifically searches for branch instructions and predicts the path by controlling the flow. Think of the railroad, where the person in front shifts tracks left or right to tell the train where to go. Now, if the guy chose the wrong direction, the train should stop, make a backup, and then move in the right direction again. A lot of time was spent.

CMOVs, on the other hand, do not control the flow. These are just instructions that require extra time (and creating additional dependencies) to figure out the correct move result based on condition codes. Think of a train, instead of deciding to go left or right, it takes a straight path that doesn't require a turn, but a little slower (obviously more complicated, but this is the best I can think of right now).

CMOVs were really bad (very high latency), but have since improved to be quite fast, making them much more useful and worthy of performance.

Hope this helps.

+6
source

Can someone please help me understand how CMOV improves branching?

Well, it does not improve branching, it removes it. CMOV can be considered as two teams in one, MOV and NOP. Which one is executed depends on the flags. So internally it may look like

if (cond) { mov dst, src } else { nop } 

...

Of course, the problem area remains the same - we do not know the address of the next instruction to execute?

Oh no. The next command is always following the CMOV, so the command pipeline is not invalid and reloads (branch rejection and other optimization functions remain on the sidelines). This is one continuous stream of macros. A simple example is as follows

 if (ecx==5) eax = TRUE else eax = FALSE 

in base asm:

 cmp ecx,5 ; is ecx==5 jne unequal ; what is the address of the next instruction? conditional branch mov eax,TRUE ; possibility one jmp fin unequal: : possibility two mov eax,FALSE fin: nop 

with CMOV

 cmp ecx,5 mov eax, FALSE ; mov doesn't affect flags mov ebx, TRUE ; because CMOV doesn't take immediate src operands, use EBX for alternative cmove eax, ebx ; executes as MOV if zero-flag is set, otherwise as NOP nop ; always the next instruction, no pipeline stall 

Is it worth it on current processors? Clear YES. In my experience and (of course) depending on the algorithm, the increase in speed is significant and worth the effort.

+4
source

Source: https://habr.com/ru/post/978698/


All Articles