SSE/AVX Complex FMA (!1683) · Merge requests · libeigen / eigen

Reference issue

What does this implement/fix?

Adds SSE and AVX implementations of complex fused-multiply-add and friends. These emit fewer instructions than composing pmadd(a,b,c) as padd(pmul(a,b),c) and are slightly more accurate. We should look into defining a generic packet op that streamlines pmul(pconj(a), b) and the FMA analogues, as we could get the conjugation for "free" by choosing the right intrinsics in the right order. We could call it pcmul(a,b) and pcmadd(a,b,c). This could be useful for squeezing a bit more performance out of dot products and the like.

Additional information

Edited Aug 28, 2024 by Charles Schlosser

SSE/AVX Complex FMA

Reference issue

What does this implement/fix?

Additional information

Merge request reports