When you see that you are working on the ARM platform, you can use the following implementation of abs in two instructions:
EORS r1, r1, r1, ASR
If you can carry the +/- 1 error in the calculations, discard the second instruction; then you can express it in C:
int abs_almost_exact(int x) { return x ^ (x >> 32); }
But the big problem is, however, the cycle. You are likely to benefit from the deployment (since there is so little to do with each iteration):
do { // assuming len is even! int value1 = *p++; int value2 = *p++; value1 = abs(value1); // or replace abs by the hand-made version value2 = abs(value2); t |= value1; t |= value2; len--; } while (len > 0);
Note. I replaced while {} with do {} while , because the compiler I use (the ARM compiler) generates the best code this way.
Also note that ARM has a delay of 2 clock cycles when loading short variables from memory (to the processor I worked with). So the minimum pivot factor is 3 (but you should still pivot).
Oh, and does your processor support reading short (half-words) variables from memory? I have heard of some very old processors that cannot do this. If so, you should change the code to load two values (1 word) at once and use some bit scripts to separate them.
source share