Optimize generic_rsqrt_newton_step (!1276) · Merge requests · libeigen / eigen

Reference issue

What does this implement/fix?

Tweaks generic_rsqrt_newton_step in a few ways:

change order of operations to improve accuracy
eliminate a floating point comparison and constant

These tests use scalar_rsqrt_op<float>. The AVX path uses an intrinsic _mm256_rsqrt_ps to compute an initial guess followed by a single newton iteration.

Accuracy: for every float, compare to the output to that calculated by MPFR's rec_sqrt. Negative ulps indicates value is less than reference

Before: worst ulps = -2.28266
After: worst ulps = -1.97993

Speed: (ms, repeats = 1 << 30 / size)

Size	Old	New	Diff
32000	283.879	194.964	-31.3%
64000	216.68	180.354	-16.8%
128000	216.34	208.688	-3.5%
256000	212.886	193.805	-9.0%
512000	212.307	197.84	-6.8%
1024000	217.387	193.233	-11.1%
2048000	268.367	248.117	-7.5%
4096000	331.165	375.633	13.4%
8192000	370.818	355.267	-4.2%
16384000	361.733	359.107	-0.7%
32768000	365.886	350.384	-4.2%
65536000	365.256	353.878	-3.1%
131072000	360.055	348.825	-3.1%
262144000	363.505	346.75	-4.6%
524288000	362.604	360.668	-0.5%
1048576000	365.782	349.517	-4.4%

Additional information

Edited Mar 24, 2023 by Charles Schlosser

Optimize generic_rsqrt_newton_step

Reference issue

What does this implement/fix?

Additional information

Merge request reports