Tensor Broadcast bug on GCC and Clang with -mfma

Summary

Broadcasting an Eigen::Tensor<std::complex<double>, 1> gives the wrong result, without warnings or errors at runtime or compile-time. The problem is not caught by -fsanitize=address

Update: The error is caught by -fsanitize=address if the broadcast dimensions are increased from 2 to 4. See on compiler explorer.

Environment

Operating System : Linux
Architecture : x64
Eigen Version : 3.4.0
Compiler Version : GCC 10.1, GCC 11.2, Clang 13.0
Compile Flags : -std=c++17 -mfma

Minimal Example

#include <string_view>
#include <unsupported/Eigen/CXX11/Tensor>

void print_tensor(const Eigen::Tensor<std::complex<double>,1> & L, std::string_view msg){
    std::printf("%s\n", msg.data());
    for(long i = 0; i < L.size(); i++) std::printf("(%.16f, %.16f)\n",L[i].real(), L[i].imag());
}


int main() {
    Eigen::Tensor<std::complex<double>,1> L(1);
    L.setConstant(1.0);

    print_tensor(L, "L");

    std::array<long,1> bcast = {2};
    Eigen::Tensor<std::complex<double>,1> Lb = L.broadcast(bcast); // Error happens here

    print_tensor(Lb, "L.broadcast({2})");

}

See it fail live on compiler explorer

Note that it works fine if one replaces std::complex<double> with double, as seen here.

Relevant logs

The program above outputs:

L
(1.0000000000000000, 0.0000000000000000)
L.broadcast({2})
(1.0000000000000000, 0.0000000000000000)
(0.0000000000000000, 0.0000000000000000)

The error is in the last line. It should be (1.0000000000000000, 0.0000000000000000)

Steps to reproduce

Compile the code above with compiler flags -std=c++17 and -mfma (or -march=native, on my machine with skylake cpu)

What is the current bug behavior?

Broadcasting a rank-1 tensor of type std::complex<double> with contents

(1.0, 0.0)

by {2}, gives

(1.0, 0.0),
(0.0, 0.0)

In fact, the last entry seems to be uninitialized memory: under seemingly random circumstances that I can't replicate, the value is some huge nondeterministic value.

What is the expected correct behavior?

The operation above should replicate the initial value, giving

(1.0, 0.0),
(1.0, 0.0)

Edited Oct 18, 2021 by DavidAce