Fix guard macros for emulated FP16 operators on GPU
What does this implement/fix?
This change fixes the following two issues:
- The macro guards for emulating FP16 operations have slightly different conditions for the push vs pop macro. This can result in an error when compiling with
__CUDA__butEIGEN_CUDACCis not defined. These guards originally matched, but only the push_macro guard was updated in a previous commit. - The comment on line 459 claims that these emulated FP16 operations should be available for both HIP and CUDA, but
EIGEN_CUDACCis used instead ofEIGEN_GPUCC.
Edited by Ryan Senanayake