Massive speed degeneration from 3.3.7 to 3.4.0

Summary

The execution speed of some basic Eigen functionality degenerated from 3.3.7 to 3.4.0.

Environment

  • Operating System : Windows
  • Architecture : x64
  • Eigen Version : 3.3.7, 3.4.0, master
  • Compiler Version:
    • Microsoft (R) C/C++ Optimizing Compiler Version 19.37.32825 for x64
    • g++ (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 (WSL 2)
  • Compile Flags : /O2 (MS VC) -O3 (G++)
  • Vector Extension : SSE/AVX but wasn't explicitly specified.

Minimal Example

https://gitlab.com/lbenner/eigenspeedtest

static void BM_EigenCheck(benchmark::State& state)
{
  double res = 0.0;
  
  Eigen::Matrix3d A;

  double var = (std::rand() % 1000) / 10.0;

  for (auto _ : state)
  {
    A << 1.0, 2.0, 3.0,
      4.0, var, 6.0,
      7.0, 8.0, 9.0;

    Eigen::Vector3d v(1.0, 2.0, 3.0);

    Eigen::Vector3d x = A * v;

    Eigen::Vector3d y = A.transpose() * v;

    Eigen::Vector3d d = x - y;

    var = A(1,1) + 0.00001;

    double r = d.norm();

    A.row(1) -= r * v;

    res = A(1, 1);
  }

  results[1] = res;
}

Steps to reproduce

  1. git clone https://gitlab.com/lbenner/eigenspeedtest.git
  2. cd eigensspeedtest
  3. git submodule update --init --recursive
  4. mkdir build && cd build
  5. cmake -G Ninja -DCMAKE_BUILD_TYPE=Release ..
  6. cmake --build .
  7. EigenSpeed

By running

git checkout <eigen branch>

inside of the Eigen directory, one can change between the different Eigen versions.

What is the current bug behavior?

With Visual Studio 2022 the execution speed of Eigen dropped massively.

||Version|| VS 2022 || G++ 12.3 ||
|---------|----------|-----------|
| 3.3.7   |  2.34 ns |   20.0 ns |
| 3.4.0   |  24.9 ns |   20.3 ns |
| master  |  26.1 ns |   20.4 ns |

The Eigen version was the only difference, all other settings remained unchanged.

What is the expected correct behavior?

For the provided benchmark the performance of Eigen 3.4.0 and later should be as good as for 3.3.7. (We tested also other 3.3.x versions, which are all comparable.)

Of course it would be good if the G++ results would be as fast as the Eigen 3.3.7

Relevant logs

No logs attached.

Benchmark scripts and results

Code is provide in the linked GitLab repo. The focus is on the BM_EigenCheck benchmark. G++ might be to clever for the other and optimized it away. And the Visual Studio result did not change between the different versions.

Visual Studio 2022 - Eigen 3.3.7

---------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FillMatrix3d      0.203 ns        0.188 ns   1000000000
BM_EigenCheck         2.94 ns         2.34 ns    280000000

Visual Studio 2022 - 3.4

----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FillMatrix3d      0.201 ns        0.109 ns   1000000000
BM_EigenCheck         24.9 ns         11.5 ns     74666667

Visual Studio 2022 - Latest

----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FillMatrix3d      0.208 ns        0.172 ns   1000000000
BM_EigenCheck         26.1 ns         21.8 ns     34461538

Linux G++ - 3.3.7

----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FillMatrix3d      0.000 ns        0.000 ns   1000000000
BM_EigenCheck         20.0 ns         20.0 ns     35157117

Linux G++ - 3.4

----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FillMatrix3d      0.000 ns        0.000 ns   1000000000
BM_EigenCheck         20.3 ns         20.3 ns     34153155

Linux G++ - Latest

----------------------------------------------------------
Benchmark                Time             CPU   Iterations
----------------------------------------------------------
BM_FillMatrix3d      0.000 ns        0.000 ns   1000000000
BM_EigenCheck         20.4 ns         20.4 ns     34680626
Edited by Lars Benner