Fix guard macros for emulated FP16 operators on GPU

What does this implement/fix?

This change fixes the following two issues:

  • The macro guards for emulating FP16 operations have slightly different conditions for the push vs pop macro. This can result in an error when compiling with __CUDA__ but EIGEN_CUDACC is not defined. These guards originally matched, but only the push_macro guard was updated in a previous commit.
  • The comment on line 459 claims that these emulated FP16 operations should be available for both HIP and CUDA, but EIGEN_CUDACC is used instead of EIGEN_GPUCC.
Edited by Ryan Senanayake

Merge request reports

Loading