operator== causes CUDA kernel launch failure when compiled with c++17

Summary

When used in CUDA kernels and compiled with c++17, the operator== of two Eigen vectors causes kernel launch failure.

Environment

  • Operating System : Ubuntu 20.04
  • Architecture : x86_64
  • Eigen Version : 3.3.9
  • Compiler Version : gcc 9.3.0
  • Compile Flags : -O3
  • Vector Extension : None
  • CUDA Version : 11.1 Update 1

Minimal Example

#include <Eigen/Dense>
#include <iostream>

#if defined(__GNUC__)
#define SafeCall(expr) ___SafeCall(expr, __FILE__, __LINE__, __func__)
#else
#define SafeCall(expr) ___SafeCall(expr, __FILE__, __LINE__)
#endif

static inline void error(const char *error_string, const char *file, const int line, const char *func)
{
    std::cout << "Error: " << error_string << "\t" << file << ":" << line << std::endl;
    exit(0);
}

static inline void ___SafeCall(cudaError_t err, const char *file, const int line, const char *func = "")
{
    if (cudaSuccess != err)
        error(cudaGetErrorString(err), file, line, func);
}

__global__ void test_kerenel()
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    Eigen::Vector3i a, b;
    bool result = (a == b); // causing error
    // auto result2 = a.cwiseEqual(b); // no error
    // auto result3 = a.cwiseEqual(b).all(); // causing error
    // auto result4 = a.all(); // no error
    printf("%d\n", i);
}

int main(int argc, char **)
{
    test_kerenel<<<3, 10>>>();
    SafeCall(cudaDeviceSynchronize());
    SafeCall(cudaGetLastError());
}

Steps to reproduce

  1. create a test.cu file from the above code
  2. compile the above file with nvcc -o test test.cu -I/usr/local/include/eigen3 -std=c++17

What is the current bug behavior?

CUDA runtime throws a Error: unspecified launch failure test.cu:36

What is the expected correct behavior?

The program should be able to print out numbers 0-29 asynchronously

Anything else that might help

When changing the compiler flag from -std=c++17 to -std=c++11, it runs without a problem.