Optimize generic_rsqrt_newton_step
Reference issue
What does this implement/fix?
Tweaks generic_rsqrt_newton_step in a few ways:
- change order of operations to improve accuracy
- eliminate a floating point comparison and constant
These tests use scalar_rsqrt_op<float>. The AVX path uses an intrinsic _mm256_rsqrt_ps to compute an initial guess followed by a single newton iteration.
Accuracy: for every float, compare to the output to that calculated by MPFR's rec_sqrt. Negative ulps indicates value is less than reference
- Before: worst ulps =
-2.28266 - After: worst ulps =
-1.97993
Speed: (ms, repeats = 1 << 30 / size)
| Size | Old | New | Diff |
|---|---|---|---|
| 32000 | 283.879 | 194.964 | -31.3% |
| 64000 | 216.68 | 180.354 | -16.8% |
| 128000 | 216.34 | 208.688 | -3.5% |
| 256000 | 212.886 | 193.805 | -9.0% |
| 512000 | 212.307 | 197.84 | -6.8% |
| 1024000 | 217.387 | 193.233 | -11.1% |
| 2048000 | 268.367 | 248.117 | -7.5% |
| 4096000 | 331.165 | 375.633 | 13.4% |
| 8192000 | 370.818 | 355.267 | -4.2% |
| 16384000 | 361.733 | 359.107 | -0.7% |
| 32768000 | 365.886 | 350.384 | -4.2% |
| 65536000 | 365.256 | 353.878 | -3.1% |
| 131072000 | 360.055 | 348.825 | -3.1% |
| 262144000 | 363.505 | 346.75 | -4.6% |
| 524288000 | 362.604 | 360.668 | -0.5% |
| 1048576000 | 365.782 | 349.517 | -4.4% |
Additional information
Edited by Charles Schlosser