What is the best way to perform branching using Intel SSE?

Question

What is the best way to perform branching using Intel SSE?

I am writing a compiler, and I have to output code for branching conditions to float values. For example, to compile a code like this:

if(a <= b){ //1. DO something } else { //2. Do something else }

When a and b are floating point variables. I just need to go to 2 if the condition is not true, otherwise it will fail 1. I am considering optimization here at the compiler level, given that in 1 and 2.

I need something that works with all comparison operators>,> =, <, <=, == and! =

The way I found the comparison is to use CMPLTSD (and other equivalent instructions for other relational operators). But with this I have to use the SSE register specifically for the result, and then I need to move its value to the general register (e.g. eax) and finally compare the value with 0.

I also saw that the UCOMISD command should set the flags correctly, but apparently it does not work the way I thought.

So what is the best way to handle this code? Are there any better instructions than the first solution I have?

For the better, I mean a general solution to this problem. If possible, I would like the code to work the same way as when comparing integers (cmp a, b; jge label). Of course, I would prefer the fastest instructions for this.

+6

assembly compiler-construction sse intel

Baptiste wicht Mar 04 '12 at 19:43

source share

2 answers

Important : @harold's answer is almost exactly right, but it has a subtle wrong aspect that can drive you crazy for the very important edge case later - NaN treatment comes back from most languages (e.g. C ++).

As @harold correctly says, the result of an unordered comparison is stored in the parity flag.

However, an unordered comparison is true if any operand is NaN , as described in this column . This means that NaN will be less than and greater than absolutely every number, including NaN .

So, if you want your language to comply with C ++ behavior, where any comparison with NaN returns false, you want:

For <= :

 ucomisd xmm0, xmm1 jbe else_label

For < :

 ucomisd xmm0, xmm1 jb else_label

Confirmed in the following gcc parsing, where I return a >= b :

 144e: 66 0f 2e c8 ucomisd %xmm0,%xmm1 1452: 0f 93 c0 setae %al

Here he uses setae , which is case-equivalent, equivalent to jae . Then it immediately returns without checking the parity flag.

For what its ja , not jg , @harold answer is still a clear and correct explanation.

And, of course, you don’t need to use ordered comparison, you can use unordered comparison, as shown in the previous answer, if you want absolutely every number to be less, greater and equal to NaN in your program / language (where even NaN < NaN truly!). And of course, as you can see, this can be a little slower, as this requires additional checks.

+1

Mike fairhurst Apr 13 '17 at 15:55

source share

harold · Accepted Answer · 2012-03-04T20:25:00+0000

The condition codes for ucomisd do not correspond to integer comparison codes with a sign, and not unsigned (with "unordered" in the parity flag). It's a little strange, I admit, but everything is clearly documented. The code, if you really want to fork, could be something like this for <= :

  ucomisd a,b ja else ; greater jp else ; unordered ; code for //1 goes here jmp end else: ; code for //2 goes here end:

For < :

 jae else ; greater or equal jp else ; unordered

I could list them all if you really want to, but you can just look at the condition codes for ucomisd and match them with what you need.

What is the best way to perform branching using Intel SSE?

More articles: