Bug in complex GEMV in OpenMP parallel-for region (gcc-14.2)

Summary

Using the gcc-14.2 compiler with optimization flags, performing a complex GEMV-product of the form y.noalias() += A*x inside a #pragma omp parallel for schedule(dynamic) region results in unexpected behaviour. More specifically, it seems optimizations are occuring that translate this operation to a GEMV of the form y := alpha*A*x + beta*y where alpha=0 instead of alpha=1.

Environment

  • Operating System : Linux (Ubuntu 22.04)
  • Architecture : x86-64
  • Eigen Version : 3.4.0
  • Compiler Version : GCC 14.2 (but also versions as old as GCC 13.1)
  • Compile Flags : -O3 -DNDEBUG -fopenmp

Minimal Example

Below a smaller snippet. A longer example in godbolt: https://godbolt.org/z/eeWsso7Kj.

// #define EIGEN_USE_BLAS
// #define EIGEN_USE_LAPACKE
#include <iostream>
#include <omp.h>
#include <Eigen/Dense>

using namespace Eigen ;
template<typename Scalar>
using Mat = Matrix<Scalar,Eigen::Dynamic,Eigen::Dynamic>;
template<typename Scalar>
using Vec = Vector<Scalar,Eigen::Dynamic> ;

using value_t = double ; // float
using complex_t = std::complex<value_t> ;

int main() {
    int m = 10, k = 3 ;

    #pragma omp parallel for num_threads(1) schedule(dynamic) // guided
    for (int i=0 ; i<1 ; ++i) {
        Mat<complex_t> A = Mat<complex_t>::Random(m,k) ;
        Vec<complex_t> t = Vec<complex_t>::Random(k) ;
        Vec<complex_t> y0 = Vec<complex_t>::Random(m) ;
        Vec<complex_t> y1 = y0 ;

        y1.noalias() += A*t ;
        std::cout << (y1(0)-y0(0))/y0(0) << std::endl ;
    }
    return 0;
}

Steps to reproduce

  1. Compile the snippet above with GCC 14.2 or use the godbolt link for a larger example and compile with GCC 13.1 up to GCC 14.2.
  2. Use optimization flags '-O3 -DNDEBUG -fopenmp'. You can add more aggressive optimization flags as well.
  3. The bug disappears in a few cases: (1) when not using optimization flags; (2) when using an older GCC version or the latest version (GCC trunk); (3) not using a complex scalar type; (4) using static scheduling.
  4. The bug still persists when: BLAS/LAPACKE is enabled, complex<float> is used and/or scheduling is changed to 'guided'.

What is the current bug behavior?

Consider the smaller snippet above. The line y1.noalias() += A*t ; transforms into a GEMV-call of the form y := alpha*A*x + beta*y. In certain cases, the bug causes alpha to equal zero which results in y1 staying equal to y0. The print statement then evaluates to zero.

What is the expected correct behavior?

We expect alpha to equal one and y1 to be different from y0, i.e. the print statement should not return zero. The fact that all the operations are defined in a parallel-for region of OpenMP should not be an issue as the scope of all the variables is restricted to one iteration of the for-loop, i.e. they are private.

Relevant additional info

The aformentioned line eventually results in a call to gemv_dense_selector<OnTheRight,ColMajor,true>::run on line 214 of Eigen/src/Core/GeneralProduct.h. I was able to add a print-statement showing the value of actualAlpha on line 231, with the bug still persisting. In this case, its value showed to be zero, hence my previous explanation.

As mentioned above, this bug only seems to arise when using GCC 13.1-14.2, not when using e.g. GCC 12.4 or GCC trunk in godbolt. Other compilers also don't seem to have this issue. This may indicate that the problem is not actually with Eigen itself.

Plan of action

I am not really sure. I do not seem to get any further in my bug hunt and wanted to raise the issue here to get some feedback from somebody else on a possible course of action.

Edited by Kobe Bruyninckx