Changes to fast SQRT/RSQRT
-
x86 processors from Skylake and Zen2 onwards have significantly higher throughput square root units. Therefore, as determined by our benchmarking, it is counter-productive to use Newton-Raphson iteration for SQRT if only SSE or AVX is available and proper handling of corner cases is required. Therefore, this change removes the corresponding specializations of internal::psqrt. Newton-Raphson is still a win for AVX512 for SQRT and for SSE/AVX/AVX512 for RSQRT.
-
Add a function for testing packet math functions on IEEE special values {+,-} x {denorm_min, min, 0, inf, NaN}, and fix the generic SQRT/RSQRT implementations to pass this test. If EIGEN_FAST_MATH is 1 we relax the test in subnormal inputs by allowing the function to return the same as the reference with the inputs flushed to zero with the same sign.