Performance regression in Eigen 3.3.0 for sub-vector access
Submitted by Daniel Vollmer
Assigned to Nobody
Link to original bugzilla bug (#1342)
Version: 3.3 (current stable)
Platform: x86 - SSE
Description
Created attachment 749
Benchmark for the performance regression
In the attached benchmark, I observe a fairly significant performance difference (~48% slower with Eigen 3.3) between the 3.2 and 3.3 branch of Eigen.
This is on an Intel(R) Xeon(R) CPU E3-1276 v3 @ 3.60GHz
using g++ 4.9.2 with the following compilation flags:
g++ -std=c++11 -Wno-deprecated -Ofast -DNDEBUG -fno-finite-math-only -I eigen eigen_bench.cpp
A large part of the performance difference goes away if I disabled unaligned vectorization (-DEIGEN_UNALIGNED_VECTORIZE=0), but a small performance difference of ~9% remains. This difference goes away when I completely disable vectorization (-DEIGEN_DONT_VECTORIZE).
The run-times (multiple runs, fastest one taken) I see are as follows:
Eigen 3.2.10: 0.91s
Eigen 3.2.10: 0.91s (EIGEN_DONT_VECTORIZE)
Eigen 3.3.0 : 1.35s
Eigen 3.3.0 : 0.99s (EIGEN_UNALIGNED_VECTORIZE=0)
Eigen 3.3.0 : 0.91s (EIGEN_DONT_VECTORIZE)
Attachment 749, "Benchmark for the performance regression":
eigen_bench.cpp