Remove inline assembly for FMA (AVX) and add remaining extensions as packet ops: pmsub, pnmadd, and pnmsub. (!824) · Merge requests · libeigen / eigen

Adding the additional variation can save explicit negations in various low-level implementations. In a followup to this change, they will be used to make preciprocal IEEE compliant with minimal overhead.

This change also removes the old workaround for register spilling in Eigen/src/Core/arch/AVX/PacketMath.h, which appears very counterproductive on modern compiler/CPU combos. For example, compiling a matrix multiplication benchmark with clang 11 without the workaround yields the following speedups on a Skylake core (in addition to the improved readability).

flags	speedup
-march=skylake	25% (!)
-mavx -mfma	12% (!)
-mavx	unchanged

Closes #2231 (closed)

Edited Jan 26, 2022 by Rasmus Munk Larsen

Remove inline assembly for FMA (AVX) and add remaining extensions as packet ops: pmsub, pnmadd, and pnmsub.

Merge request reports