Do you only need to manipulate 4x4 matrices? Most general-purpose linear algebra libraries have been highly optimized for large matrices with little attention to smaller ones. Part of the reason I wrote EJML was to solve this problem and encourage other developers to optimize for small matrices. EJML is the fastest for small matrices, but you can do better.
If you really need more performance, I would not use any ordinary suspects and instead roll back my own highly specialized code. It should be possible to beat general-purpose libraries several times.
A simple example for a 2x2 matrix:
public class Matrix2x2 { double a11,a12,a21,a22; } public static void mult( Matrix2x2 a , Matrix2x2 b , Matrix2x2 c ) { c.a11 = a.a11*b.a11 + a.12*b.a21; c.a12 = a.a11*b.a12 + a.12*b.a22; c.a21 = a.a21*b.a11 + a.22*b.a21; c.a22 = a.a21*b.a12 + a.22*b.a22; }
Note. I did not try to compile this code, this is just an example.
source share