We are just porting our codebase to Eigen 3.3 (this was a commitment with all 32-byte alignment issues). However, there are a few places where the performance seems to have suffered a lot, contrary to expectations (I was looking forward to some acceleration, given the additional support for FMA and AVX ...). These include eigenvalue decomposition and the matrix * matrix.transpose () * vector products. I wrote two minimal working examples to demonstrate.
All tests are performed on a modern Arch Linux system using an Intel Core i7-4930K processor (3.40 GHz) and compiled with g ++ version 6.2.1.
1. Expansion of the eigenvalue:
A simple self-adjoint eigenvalue expansion is two times longer with Eigen 3.3.0, as with 3.2.10.
File test_eigen_EVD.cpp
:
#define EIGEN_DONT_PARALLELIZE #include <Eigen/Dense> #include <Eigen/Eigenvalues> #define SIZE 200 using namespace Eigen; int main (int argc, char* argv[]) { MatrixXf mat = MatrixXf::Random(SIZE,SIZE); SelfAdjointEigenSolver<MatrixXf> eig; for (int n = 0; n < 1000; ++n) eig.compute (mat); return 0; }
Test results:
about own 3.2.10:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.2.10 test_eigen_EVD.cpp -o test_eigen_EVD && time ./test_eigen_EVD real 0m5.136s user 0m5.133s sys 0m0.000s
about own 3.3.0:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.3.0 test_eigen_EVD.cpp -o test_eigen_EVD && time ./test_eigen_EVD real 0m11.008s user 0m11.007s sys 0m0.000s
Not sure what could be causing this, but if someone can see a way to maintain performance with Eigen 3.3, I would like to know about it!
2. matrix * matrix.transpose () * vector product:
This particular example uses over 200 Γ longer with Eigen 3.3.0 ...
File test_eigen_products.cpp
:
#define EIGEN_DONT_PARALLELIZE #include <Eigen/Dense> #define SIZE 200 using namespace Eigen; int main (int argc, char* argv[]) { MatrixXf mat = MatrixXf::Random(SIZE,SIZE); VectorXf vec = VectorXf::Random(SIZE); for (int n = 0; n < 50; ++n) vec = mat * mat.transpose() * VectorXf::Random(SIZE); return vec[0] == 0.0; }
Test results:
about own 3.2.10:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.2.10 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products real 0m0.040s user 0m0.037s sys 0m0.000s
about own 3.3.0:
g++ -march=native -O2 -DNDEBUG -isystem eigen-3.3.0 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products real 0m8.112s user 0m7.700s sys 0m0.410s
Adding brackets to a line in a loop as follows:
vec = mat * ( mat.transpose() * VectorXf::Random(SIZE) );
makes a huge difference: both versions of Eigen work equally well (actually 3.3.0 is slightly better) and faster than the unclassified case 3.2.10. So there is a fix. However, it is strange that 3.3.0 will struggle so much with this.
I do not know if this is a mistake, but I think it is worth reporting if this is what needs to be fixed. Or maybe I was just doing it wrong ...
Any thoughts appreciated. Hooray, Donald.
EDIT
As ggael noted , EVD in Eigen 3.3 is faster if compiled using clang++
or with -O3
with g++
. So problem 1 is fixed.
Problem 2 is actually not a problem, as I can just put brackets to force the most efficient order of operations. But just for completeness: there seems to be a flaw in evaluating these operations. Eigen is an incredible piece of software, I think it probably deserves a fix. Here's a modified version of MWE to show that it is unlikely to be associated with the first temporary product pulled out of the loop (at least as far as I can tell):
#define EIGEN_DONT_PARALLELIZE #include <Eigen/Dense> #include <iostream> #define SIZE 200 using namespace Eigen; int main (int argc, char* argv[]) { VectorXf vec (SIZE), vecsum (SIZE); MatrixXf mat (SIZE,SIZE); for (int n = 0; n < 50; ++n) { mat = MatrixXf::Random(SIZE,SIZE); vec = VectorXf::Random(SIZE); vecsum += mat * mat.transpose() * VectorXf::Random(SIZE); } std::cout << vecsum.norm() << std::endl; return 0; }
In this example, all operands are initialized in a loop and the results are accumulated in vecsum
, so the compiler cannot vecsum
anything or optimize unnecessary calculations. This shows the same behavior (this time testing with clang++ -O3
(version 3.9.0):
$ clang++ -march=native -O3 -DNDEBUG -isystem eigen-3.2.10 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products 5467.82 real 0m0.060s user 0m0.057s sys 0m0.000s $ clang++ -march=native -O3 -DNDEBUG -isystem eigen-3.3.0 test_eigen_products.cpp -o test_eigen_products && time ./test_eigen_products 5467.82 real 0m4.225s user 0m3.873s sys 0m0.350s
The same result, but significantly different execution time. Fortunately, this is easy to solve by placing the brackets in the right places, but there seems to be a regression somewhere in Eigen 3.3, evaluating operations. By using parentheses around the mat.transpose() * VectorXf::Random(SIZE)
, the execution time for both versions of Eigen is reduced to about 0.020 seconds (therefore, Eigen 3.2.10 clearly also benefits in this case). At least that means we can continue to get amazing performance from Eigen!
At the same time, I agree with ggael's answer, all I needed to know in order to move forward.