Changes to fast SQRT/RSQRT (!868) · Merge requests · libeigen / eigen

x86 processors from Skylake and Zen2 onwards have significantly higher throughput square root units. Therefore, as determined by our benchmarking, it is counter-productive to use Newton-Raphson iteration for SQRT if only SSE or AVX is available and proper handling of corner cases is required. Therefore, this change removes the corresponding specializations of internal::psqrt. Newton-Raphson is still a win for AVX512 for SQRT and for SSE/AVX/AVX512 for RSQRT.
Add a function for testing packet math functions on IEEE special values {+,-} x {denorm_min, min, 0, inf, NaN}, and fix the generic SQRT/RSQRT implementations to pass this test. If EIGEN_FAST_MATH is 1 we relax the test in subnormal inputs by allowing the function to return the same as the reference with the inputs flushed to zero with the same sign.

Edited Feb 18, 2022 by Rasmus Munk Larsen

Changes to fast SQRT/RSQRT