You want the vrev64.16
command, however it does not swap between double registers of the same quadrant register. You must achieve this using additional vswp
.
For built-in functions
q = vrev64q_u16(q)
should do the trick to replace inside double words, then you need to swap words in square case. However, this becomes cumbersome since vswp
intrinsics do not directly exist, which forces you to use something like
q = vcombine_u16(vget_high_u16(q), vget_low_u16(q))
which actually ends as a vswp
instruction.
The following is an example.
#include <stdio.h>
which generates the assembly as shown below
vld1.16 {d16-d17}, [sp:64] movs r4, #0 vrev64.16 q8, q8 vswp d16, d17 vst1.16 {d16-d17}, [r5]
and displays
$ rev 0x108 0x107 0x106 0x105 0x104 0x103 0x102 0x101
source share