operator== causes CUDA kernel launch failure when compiled with c++17
Summary
When used in CUDA kernels and compiled with c++17, the operator== of two Eigen vectors causes kernel launch failure.
Environment
- Operating System : Ubuntu 20.04
- Architecture : x86_64
- Eigen Version : 3.3.9
- Compiler Version : gcc 9.3.0
- Compile Flags : -O3
- Vector Extension : None
- CUDA Version : 11.1 Update 1
Minimal Example
#include <Eigen/Dense>
#include <iostream>
#if defined(__GNUC__)
#define SafeCall(expr) ___SafeCall(expr, __FILE__, __LINE__, __func__)
#else
#define SafeCall(expr) ___SafeCall(expr, __FILE__, __LINE__)
#endif
static inline void error(const char *error_string, const char *file, const int line, const char *func)
{
std::cout << "Error: " << error_string << "\t" << file << ":" << line << std::endl;
exit(0);
}
static inline void ___SafeCall(cudaError_t err, const char *file, const int line, const char *func = "")
{
if (cudaSuccess != err)
error(cudaGetErrorString(err), file, line, func);
}
__global__ void test_kerenel()
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
Eigen::Vector3i a, b;
bool result = (a == b); // causing error
// auto result2 = a.cwiseEqual(b); // no error
// auto result3 = a.cwiseEqual(b).all(); // causing error
// auto result4 = a.all(); // no error
printf("%d\n", i);
}
int main(int argc, char **)
{
test_kerenel<<<3, 10>>>();
SafeCall(cudaDeviceSynchronize());
SafeCall(cudaGetLastError());
}
Steps to reproduce
- create a
test.cufile from the above code - compile the above file with
nvcc -o test test.cu -I/usr/local/include/eigen3 -std=c++17
What is the current bug behavior?
CUDA runtime throws a Error: unspecified launch failure test.cu:36
What is the expected correct behavior?
The program should be able to print out numbers 0-29 asynchronously
Anything else that might help
When changing the compiler flag from -std=c++17 to -std=c++11, it runs without a problem.