Which operator is faster (> or> =), (<or <=)?

Question

Which operator is faster (> or> =), (<or <=)?

Is < cheaper (faster) than <= , and likewise, > cheaper (faster) than >= ?

Disclaimer: I know that I can measure, but it will only be on my machine, and I'm not sure if the answer can be "implementation specific" or something like that.

+6

performance optimization c operators assembly

Dimitar slavchev Aug 1 '12 at 16:15

source share

2 answers

TL DR

It seems that the difference between the four operators is small, since they all work at about the same time for me (they may be different on different systems!). Therefore, when in doubt, just use the operator that makes the most sense for the situation (especially when you mess with C ++).

So, without further ado, here is a long explanation:

Assuming integer comparison:

As for the assembly, the results are platform dependent. On my computer (Apple LLVM Compiler 4.0, x86_64), the results (generated assembly looks like this):

 a < b (uses 'setl'): movl $10, -8(%rbp) movl $15, -12(%rbp) movl -8(%rbp), %eax cmpl -12(%rbp), %eax setl %cl andb $1, %cl movzbl %cl, %eax popq %rbp ret a <= b (uses 'setle'): movl $10, -8(%rbp) movl $15, -12(%rbp) movl -8(%rbp), %eax cmpl -12(%rbp), %eax setle %cl andb $1, %cl movzbl %cl, %eax popq %rbp ret a > b (uses 'setg'): movl $10, -8(%rbp) movl $15, -12(%rbp) movl -8(%rbp), %eax cmpl -12(%rbp), %eax setg %cl andb $1, %cl movzbl %cl, %eax popq %rbp ret a >= b (uses 'setge'): movl $10, -8(%rbp) movl $15, -12(%rbp) movl -8(%rbp), %eax cmpl -12(%rbp), %eax setge %cl andb $1, %cl movzbl %cl, %eax popq %rbp ret

That doesn't say much. So, we go to the standard:

And ladies and gentlemen, the results in this, I created the following test program (I know that the “clock” is not the best way to calculate results like this, but it will need to be done now).

 #include <time.h> #include <stdio.h> #define ITERS 100000000 int v = 0; void testL() { clock_t start = clock(); v = 0; for (int i = 0; i < ITERS; i++) { v = i < v; } printf("%s: %lu\n", __FUNCTION__, clock() - start); } void testLE() { clock_t start = clock(); v = 0; for (int i = 0; i < ITERS; i++) { v = i <= v; } printf("%s: %lu\n", __FUNCTION__, clock() - start); } void testG() { clock_t start = clock(); v = 0; for (int i = 0; i < ITERS; i++) { v = i > v; } printf("%s: %lu\n", __FUNCTION__, clock() - start); } void testGE() { clock_t start = clock(); v = 0; for (int i = 0; i < ITERS; i++) { v = i >= v; } printf("%s: %lu\n", __FUNCTION__, clock() - start); } int main() { testL(); testLE(); testG(); testGE(); }

Which, on my machine (compiled with -O0 ), gives me this (5 separate runs):

  testL: 337848
 testLE: 338237
 testG: 337888
 testGE: 337787

 testL: 337768
 testLE: 338110
 testG: 337406
 testGE: 337926

 testL: 338958
 testLE: 338948
 testG: 337705
 testGE: 337829

 testL: 339805
 testLE: 339634
 testG: 337413
 testGE: 337900

 testL: 340490
 testLE: 339030
 testG: 337298
 testGE: 337593

I would say that the differences between these operators are insignificant at best and do not carry much weight in the modern computer world.

+10

Richard J. Ross III Aug 1 '12 at 16:28

source share

old_timer · Accepted Answer · 2012-08-01T18:03:14+0000

it changes, first start by exploring different instruction sets and how compilers use these instruction sets. Take openrisc 32, for example, which is undoubtedly inspired by mips, but makes conditional differences differently. For or32 there are commands for comparing and setting the flag, compare these two registers, if they are less than or equal to unsigned, then set the flag, compare these two registers, if they are equal, set the flag. Then there are two branches of the conditional branch of the branch on the flag set, and the branch on the flag. The compiler must follow one of these paths, but less than, less than or equal to, more, etc. Everyone will use the same number of instructions, the same runtime for a conditional branch, and the same runtime so as not to execute a conditional branch.

Now this will definitely be true for most branch architectures, it takes longer than the branch does not, due to the need to reset and refill the pipe. Some of them predict a branch, etc., to help with this problem.

Now some architectures can change in size, compare gpr0 and gpr1 vs compare gpr0 and the immediate number 1234, it may take a lot of instruction, you will see this a lot with x86, for example. so although both cases can be a branch if they are smaller than you code less, depending on which registers will contain values that can affect performance (I'm sure x86 does a lot of pipelining operations, a lot of caching, etc. . to compensate for these problems). Another similar example is mips and or32, where r0 is always zero, this is not a very general case, if you write to it, it does not change, it is tied to zero, so a comparison, if it is 0, MAY cost you more than a comparison, if it is equal to some other number, if an additional command or two is required to fill gpr so immediately that comparisons can happen, the worst case is to push the register onto the stack or memory to free register, to set it up immediately so that it is possible to compare.

Some architectures have conditional execution, such as a hand, to execute all the commands (not the thumb) that you can execute on each instruction, so if you have code

 if(i==7) j=5; else j=9;

pseudocode for the hand will be

 cmp i,#7 moveq j,#5 movne j,#7

there is no branch, so not a single conveyor pushes you directly, very quickly.

One architecture for another, if this is an interesting comparison, as mentioned, mips or 32, you need to specially execute some kind of instruction for comparison, others, such as x86, msp430 and the vast majority of alu operations, change flags, lever, etc. if you tell him to change the flags, otherwise, as shown above. so

 while(--len) { //do something }

loop, subtraction 1 also sets flags, if the material in the loop was simple enough, you could make everything conditional, so you save separate comparison and branching commands, and you save the penalty in the pipeline. Mips solves this a bit, comparing and the branch is one instruction, and they execute one instruction after the branch to save a little in the pipe.

The general answer is that you will not see the difference, the number of instructions, the execution time, etc. same for different conditional expressions. special cases, such as small immediate actions and large immediate actions, etc., can have an effect for corner cases, or the compiler can just do things differently depending on what kind of comparison you are doing. If you try to rewrite your algorithm to give the same answer, but use less than instead of more and equal, you could change the code enough to get a different stream of commands. Similarly, if you run a performance test that is too simple, the compiler can / will optimize the full comparison and simply generate results, which may vary depending on your test code causing another execution. The key to all this is to parse what you want to compare and see how the instructions differ. This will tell you whether any differences in execution should be expected.

Which operator is faster (> or> =), (<or <=)?

TL DR

More articles: