Tensor Broadcast bug on GCC and Clang with -mfma

Summary

Broadcasting an Eigen::Tensor<std::complex<double>, 1> gives the wrong result, without warnings or errors at runtime or compile-time. The problem is not caught by -fsanitize=address

Update: The error is caught by -fsanitize=address if the broadcast dimensions are increased from 2 to 4. See on compiler explorer.

Environment

  • Operating System : Linux
  • Architecture : x64
  • Eigen Version : 3.4.0
  • Compiler Version : GCC 10.1, GCC 11.2, Clang 13.0
  • Compile Flags : -std=c++17 -mfma

Minimal Example

#include <string_view>
#include <unsupported/Eigen/CXX11/Tensor>

void print_tensor(const Eigen::Tensor<std::complex<double>,1> & L, std::string_view msg){
    std::printf("%s\n", msg.data());
    for(long i = 0; i < L.size(); i++) std::printf("(%.16f, %.16f)\n",L[i].real(), L[i].imag());
}


int main() {
    Eigen::Tensor<std::complex<double>,1> L(1);
    L.setConstant(1.0);

    print_tensor(L, "L");

    std::array<long,1> bcast = {2};
    Eigen::Tensor<std::complex<double>,1> Lb = L.broadcast(bcast); // Error happens here

    print_tensor(Lb, "L.broadcast({2})");

}

See it fail live on compiler explorer

Note that it works fine if one replaces std::complex<double> with double, as seen here.

Relevant logs

The program above outputs:

L
(1.0000000000000000, 0.0000000000000000)
L.broadcast({2})
(1.0000000000000000, 0.0000000000000000)
(0.0000000000000000, 0.0000000000000000)

The error is in the last line. It should be (1.0000000000000000, 0.0000000000000000)

Steps to reproduce

  1. Compile the code above with compiler flags -std=c++17 and -mfma (or -march=native, on my machine with skylake cpu)

What is the current bug behavior?

Broadcasting a rank-1 tensor of type std::complex<double> with contents

(1.0, 0.0)

by {2}, gives

(1.0, 0.0),
(0.0, 0.0)

In fact, the last entry seems to be uninitialized memory: under seemingly random circumstances that I can't replicate, the value is some huge nondeterministic value.

What is the expected correct behavior?

The operation above should replicate the initial value, giving

(1.0, 0.0),
(1.0, 0.0)
Edited by DavidAce