I want to calculate the magnitude and angle of 4 points using neon SIMD commands and a lever. There is a built-in library in most languages, C ++ in my case, which calculates the angle (atan2), but only for one pair of floating point variables (x and y). I would like to use SIMD instructions that deal with q-registers to calculate atan2 for a vector of 4 values.
Accuracy should not be high, speed is more important.
I already have some assembly instructions that calculate the value of 4 floating point registers, with acceptable accuracy for my application. q1 contains 4 "x" values ββ(x1, x2, x3, x4). q2 contains 4 "y" values ββ(y1, y2, y3, y4). q7 contains the value of 4 results (x1 ^ 2 + y1 ^ 2, x2 ^ 2 + y2 ^ 2, x3 ^ 2 + y3 ^ 2, x4 ^ 2 + y4 ^ 2).
vmul.f32 q7, q1, q1 vmla.f32 q7, q2, q2 vrecpe.f32 q7, q7 vrsqrte.f32 q7, q7
What is the fastest way to calculate the approximate atan2 for two vectors using SIMD instructions?
Ammar source share