Performance regression with gcc and AutoDiffScalar due to commit c01ff453
Summary
I'm seeing a performance regression between Eigen 3.4.0 and commit f5ead2d3 using gcc. The example below is about 2 times slower if compiled with a recent Eigen master (in my case f5ead2d3) than with Eigen 3.4.0. There is virtually no difference between both versions when using clang. I bisected Eigen and ended up at commit c01ff453 and could indeed verify that this commit is responsible for the regression in this case. I suspect that the issue is not specific to AutoDiffScalar but rather to how matrices are initialized and assigned (which are used to store the derivatives in the AutoDiffScalar), but wasn't able to produce an example without using AutoDiffScalar.
The godbolt example of my benchmark below also shows that gcc produces a lot of additional move instructions with Eigen trunk compared to 3.4.0. This indeed also happens when comparing commits c01ff453 and its parent af59ada0. Clang produces the same assembly for both versions.
Environment
- Operating System : Linux
- Architecture : x86_64
- Eigen Version : c01ff453
- Compiler Version : Gcc10+
- Compile Flags : -O1+
- Vector Extension : none
Minimal Example
https://godbolt.org/z/Mdvr3GnGa
#include <chrono>
#include <cmath>
#include <iostream>
#include <Eigen/Dense>
#include <unsupported/Eigen/AutoDiff>
using Scalar = double;
using AD = Eigen::AutoDiffScalar<Eigen::Matrix<Scalar, 1, 1>>;
template<typename T>
T Func(const T &x)
{
using std::cos;
using std::pow;
using std::sqrt;
return cos(2.0 * x) + sqrt(x) + pow(x, 0.7);
}
int main(){
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
AD x{1.0, AD::DerType::Constant(1.0)};
auto t1 = high_resolution_clock::now();
AD y;
for (std::size_t i = 0; i < 1000000; ++i)
{
y = Func(x);
x += 0.1;
}
auto t2 = high_resolution_clock::now();
auto ms_int = duration_cast<milliseconds>(t2 - t1);
std::cout << "y = " << y << std::endl;
std::cout << ms_int.count() << "ms\n";
return 0;
}
Steps to reproduce
- compile the example above with commit c01ff453
- compile the example above with commit af59ada0
- compare the run time and the assembly
What is the current bug behavior?
Commit c01ff453 leads to a performance regression in AutoDiffScalar when using gcc.
What is the expected correct behavior?
Similar performance between both versions.