SSE/AVX use fmaddsub for complex products (!1663) · Merge requests · libeigen / eigen

Reference issue

What does this implement/fix?

Interestingly, clang does not automagically fuse vmulps and vaddsubps into vfmaddsub (even with -ffast-math). This micro-optimizes the SSE/AVX complex multiplication kernels. This approach was already implemented in AVX512.

Clang (x86/64 trunk) -O3 -DNDEBUG -mavx2 -mfma:

Old	New
vmovsldup	vmovshdup
vmulps	vshufps
vmovshdup	vmulps
vshufps	vmovsldup
vmulps	vfmaddsub213ps
vaddsubps
ret	ret

There may be some creative approaches to implementing pmadd, but they were not apparent to me.

SSE/AVX use fmaddsub for complex products

Reference issue

What does this implement/fix?

Additional information

Merge request reports