How does this math rounding function work?

Can someone explain what this function does?

static inline void round_to_zero(volatile float *f) { *f += 1e-18; *f -= 1e-18; } 

I mean, besides adding 1e-18 and subtracting it again, I understand that. But I don’t understand what effect he will get on the float transferred to him. The reason I'm trying to figure this out is because I use doubles in some code that uses this function (which I converted from float). Its sound code and the above function come from this library:

https://github.com/swh/lv2/blob/master/include/ladspa-util.h

I am wondering if he will work on the double as it is or whether it needs to be changed for the additional accuracy that the double has. I suspect this knocks the last few bits of data, erasing them from the float, if any, although I don't quite understand how to do this. But I think that if this is what he does, I will need to change the indicator so that it matches the double.

TIA, Pete

+5
source share
2 answers

The following code demonstrates what this function does.

 int main( void ) { float a; a = -1.0; a /= 1e100; printf( "%f\n", a ); round_to_zero( &a ); printf( "%f\n", a ); } 

What you need to know is that IEEE-754 floating point numbers have two possible values ​​for 0 . There is a positive 0 and a negative 0 . The round_to_zero function converts a negative 0 to a positive 0.

The value 1e-18 approximately 1 lsb for the double precision number 1.0 . Therefore, I do not think that any changes are necessary to use this function using double (except for changing the type of the argument, of course).

+1
source

I think I should go back to this to add the following data.

While the answer, referring to the conversion of negative zero to positive, is true and useful to me, the more it is than that.

Adding 1e-18 and then subtracting from the float really destroys the very low numbers from the float. This is used in audio applications because filters can return small floats through functions that constantly divide the floats, which leads to a decrease in the number. When a number becomes denormalized (as Caskey mentioned), the processing speed for that number in many processors (including x86) becomes up to 100 times slower.

By adding a much larger number than the denormal size number for this data type, you will destroy the small value stored in the type. Subtracting the same larger value results in a type containing zero, which does not affect the processing speed if it is processed. The reason you destroy the tiny value is because the Significand precision in the type is not large enough to hold both the very small value and the larger value you just added.

For instance:

Start with a sample sound with a value of 1.0f.

Put this through a function 40 times, which divides by 10, leaving the value 1e-40.

v = 0.0100000 e-38 (the float type has about 8 decimal values ​​of accuracy and the exponent is up to 38, so it looks like in memory as I wrote here)

Now this is a denormal value for the float type, and the processor will process it very slowly. How to get rid of slowdown? Make it equal to zero. So:

Add 1e-18; result: 1.00000000 e-18 (note that the original 1e-40 is too small to be represented in an 8-digit value if it already has a much larger value of 1e-18).

Then subtract 1e-18: 0.00000000 e-0

Therefore, we produce zero by destroying the original denormal value, and our cpu thanks us.

+1
source

Source: https://habr.com/ru/post/1203042/


All Articles