Effective Floating Point Comparison (Cortex-A8)

There is a large (~ 100,000) array of floating point variables, and there is a threshold (also floating ).

The problem is that I have to compare each variable from the array with a threshold, but transferring the NEON flags takes a very long time (~ 20 cycles according to the profiler).

Is there an effective way to compare these values?

NOTE: Since the rounding error does not matter, I tried the following:

float arr[10000]; float threshold; .... int a = arr[20]; // eg int t = threshold; if (t > a) {....} 

But in this case, I get the following sequence of processor commands:

 vldr.32 s0, [r0] vcvt.s32.f32 s0, s0 vmov r0, s0 <--- takes 20 cycles as `vmrs APSR_nzcv, fpscr` in case of cmp r0, r1 floating point comparison 

Since the conversion takes place in NEON, there is no question if I compare integers in the described way or float.

+6
source share
4 answers

If the floats are 32-bit IEEE-754, and int is also 32-bit, and if there are no values ​​+ infinity, -infinity and NaN , we can compare float as int with a little trick:

 #include <stdio.h> #include <limits.h> #include <assert.h> #define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1] C_ASSERT(sizeof(int) == sizeof(float)); C_ASSERT(sizeof(int) * CHAR_BIT == 32); int isGreater(float* f1, float* f2) { int i1, i2, t1, t2; i1 = *(int*)f1; i2 = *(int*)f2; t1 = i1 >> 31; i1 = (i1 ^ t1) + (t1 & 0x80000001); t2 = i2 >> 31; i2 = (i2 ^ t2) + (t2 & 0x80000001); return i1 > i2; } int main(void) { float arr[9] = { -3, -2, -1.5, -1, 0, 1, 1.5, 2, 3 }; float thr; int i; // Make sure floats are 32-bit IEE754 and // reinterpreted as integers as we want/expect { static const float testf = 8873283.0f; unsigned testi = *(unsigned*)&testf; assert(testi == 0x4B076543); } thr = -1.5; for (i = 0; i < 9; i++) { printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr); } thr = 1.5; for (i = 0; i < 9; i++) { printf("%f %s %f\n", arr[i], "<=\0> " + 3*isGreater(&arr[i], &thr), thr); } return 0; } 

Conclusion:

 -3.000000 <= -1.500000 -2.000000 <= -1.500000 -1.500000 <= -1.500000 -1.000000 > -1.500000 0.000000 > -1.500000 1.000000 > -1.500000 1.500000 > -1.500000 2.000000 > -1.500000 3.000000 > -1.500000 -3.000000 <= 1.500000 -2.000000 <= 1.500000 -1.500000 <= 1.500000 -1.000000 <= 1.500000 0.000000 <= 1.500000 1.000000 <= 1.500000 1.500000 <= 1.500000 2.000000 > 1.500000 3.000000 > 1.500000 

Of course, it makes sense to pre-compute this final integer value in isGreater() , which is used in the comparison operator if your threshold does not change.

If you are afraid of undefined behavior in C / C ++ in the above code, you can rewrite the code in the assembly.

+5
source

If your data is float, then you should do a comparison with floats, for example.

 float arr[10000]; float threshold; .... float a = arr[20]; // eg if (threshold > a) {....} 

otherwise you will have expensive float-int conversions.

+2
source

This example shows how bad the code generated by the compiler can be:

It loads the value using NEON only to convert it to int, then transfers NEON-> ARM, which causes the pipeline to reset, which results in a waste of 11 ~ 14 cycles.

The best solution would be to write the function fully assembled.

However, there is a simple trick that allows you to quickly compare float without typecasting AND truncation:

The threshold is positive (exactly as fast as the int comparison):

 void example(float * pSrc, float threshold, unsigned int count) { typedef union { int ival, unsigned int uval, float fval } unitype; unitype v, t; if (count==0) return; t.fval = threshold; do { v.fval = *pSrc++; if (v.ival < t.ival) { // your code here } else { // your code here (optional) } } while (--count); } 

The threshold is negative (1 cycle is more per value than int comparison):

 void example(float * pSrc, float threshold, unsigned int count) { typedef union { int ival, unsigned int uval, float fval } unitype; unitype v, t, temp; if (count==0) return; t.fval = threshold; t.uval &= 0x7fffffff; do { v.fval = *pSrc++; temp.uval = v.uval ^ 0x80000000; if (temp.ival >= t.ival) { // your code here } else { // your code here (optional) } } while (--count); } 

I think it will be much faster than the above. Again, I'm too late.

+2
source

If rounding errors do not matter, you should use std :: lrint .

Fast conversion from floating point to integers recommends using it to convert floating point to int.

0
source

Source: https://habr.com/ru/post/914506/


All Articles