Improve AVX macro for FMA
According to discussion with @hmenke we can use intrinsic defines of __FMA__
and __FMA4__
. Tested locally on my laptop and CMake where __FMA__
is defined. Previously the HAVE_FMA3
checked via check_function_exists(_mm_fmadd_pd)
didn't actually work because there is no actual symbol/function to be checked there, but with this intrinsic macro it works.
To check the define macros you can run:
$ echo | gcc -march=native -dM -E - | grep FMA
#define __FP_FAST_FMA 1
#define __FP_FAST_FMAF 1
#define __FMA__ 1
#define __FP_FAST_FMAF32 1
#define __FP_FAST_FMAF64 1
#define __FP_FAST_FMAF32x 1
$ echo | clang -march=native -dM -E - | grep FMA
#define __FMA__ 1
$ echo | icx -march=native -dM -E - | grep FMA
#define __FMA__ 1
Have not checked against machines without __FMA__
or with __FMA4__
. As far as we've seen the FMA4 are no longer produced.
Edited by Cristian Le