Different cast trimming results

I have some difficulty predicting how my C code will truncate the results. Refer to the following:

float fa,fb,fc; short ia,ib; fa=160 fb=0.9; fc=fa*fb; ia=(short)fc; ib=(short)(fa*fb); 

Results: ia = 144, ib = 143.

I can understand the arguments for any result, but I don’t understand why the two calculations are handled differently. Can someone point me where this behavior is defined or explain the difference?

Edit: the results were compiled using MS Visual C ++ Express 2010 on the Intel i3-330m core. I get the same results in gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) in Virtual Box on the same machine.

+4
source share
3 answers

The compiler is allowed to use higher precision for subexpression of type fa*fb than when assigning a variable to float , for example fc . So this is the fc= part, which changes the result very slightly (and then happens to change the value of integer truncation).

+7
source

aschepler explained the mechanics of what is going well, but the main problem with your code is to use a value that does not exist as a float in the code, which depends on the value of its approximation in an unstable way. If you want to multiply by 0.9 (the actual number is 0.9 = 9/10, and not a floating point value of 0.9 or 0.9f ), you should multiply by 9, then divide by 10 or forget about floating point types and use decimal arithmetic library.

A cheap and dirty road around the problem, when unstable points are isolated, as in your example here, it is just to add a value (usually 0.5), which, as you know, will be more than an error, but less than the difference with the next integer before truncation.

+3
source

It depends on the compiler. In mine (gcc 4.4.3), it gives the same result for both expressions, namely -144, probably because the identical expression is optimized.

Others explained well what happened. In other words, I would say that the differences are probably due to the fact that your compiler internally promotes floats into 80 bit fpu registers before doing the multiplication, and then converts back to either float or short.

If my hypothesis is true, if you write ib = (short)(float)(fa * fb); , you should get the same result as with a short cast fc.

0
source

Source: https://habr.com/ru/post/1332820/