Skip to content

Improve AVX macro for FMA

Cristian Le requested to merge fix/avx-detection into main

According to discussion with @hmenke we can use intrinsic defines of __FMA__ and __FMA4__. Tested locally on my laptop and CMake where __FMA__ is defined. Previously the HAVE_FMA3 checked via check_function_exists(_mm_fmadd_pd) didn't actually work because there is no actual symbol/function to be checked there, but with this intrinsic macro it works.

To check the define macros you can run:

$ echo | gcc -march=native -dM -E - | grep FMA
#define __FP_FAST_FMA 1
#define __FP_FAST_FMAF 1
#define __FMA__ 1
#define __FP_FAST_FMAF32 1
#define __FP_FAST_FMAF64 1
#define __FP_FAST_FMAF32x 1
$ echo | clang -march=native -dM -E - | grep FMA
#define __FMA__ 1
$ echo | icx -march=native -dM -E - | grep FMA
#define __FMA__ 1

Have not checked against machines without __FMA__ or with __FMA4__. As far as we've seen the FMA4 are no longer produced.

Edited by Cristian Le

Merge request reports