enabling fmadd intrinsics with clang
tested on linux with clang 12
it could be worth enabling intrinsics in pmadd on recent versions of clang. removing the deactivation condition (|| (EIGEN_COMP_CLANG)) in src/Core/arch/AVX/PacketMath.h sped up matrix multiplication by up to 15% in certain cases bench.cpp
compiled with clang++ bench.cpp -O3 -mavx2 -mfma -mtune=haswell -lbenchmark -lpthread -DNDEBUG -I.
Edited by sarah
