Replace calls to numext::fma with numext:madd.

The function numext::fma should be reserved for when we actually need the extended precision. In cases where the extra precision is not necessary, madd will try to do the "best" thing:

  • Use FMA if there is a CPU instruction for it (i.e. EIGEN_VECTORIZE_FMA)
  • Otherwise, fall back to x * y + z

This helps prevent excessive slowdowns. For example, with emscripten/WASM, the software-emulated FMA implementation is about 30x slower than a basic multiply-add. On Intel/AMD CPUs, the emulated FMA seems to be 3-5x slower than multiply-add. If FMA CPU instructions are available, then FMA seems to be on-par performance-wise with multiply-add, so we get the extra precision for free.

Fixes #2959 (closed).

Merge request reports

Loading