Optimize generic_rsqrt_newton_step

Reference issue

What does this implement/fix?

Tweaks generic_rsqrt_newton_step in a few ways:

  • change order of operations to improve accuracy
  • eliminate a floating point comparison and constant

These tests use scalar_rsqrt_op<float>. The AVX path uses an intrinsic _mm256_rsqrt_ps to compute an initial guess followed by a single newton iteration.

Accuracy: for every float, compare to the output to that calculated by MPFR's rec_sqrt. Negative ulps indicates value is less than reference

  • Before: worst ulps = -2.28266
  • After: worst ulps = -1.97993

Speed: (ms, repeats = 1 << 30 / size)

Size Old New Diff
32000 283.879 194.964 -31.3%
64000 216.68 180.354 -16.8%
128000 216.34 208.688 -3.5%
256000 212.886 193.805 -9.0%
512000 212.307 197.84 -6.8%
1024000 217.387 193.233 -11.1%
2048000 268.367 248.117 -7.5%
4096000 331.165 375.633 13.4%
8192000 370.818 355.267 -4.2%
16384000 361.733 359.107 -0.7%
32768000 365.886 350.384 -4.2%
65536000 365.256 353.878 -3.1%
131072000 360.055 348.825 -3.1%
262144000 363.505 346.75 -4.6%
524288000 362.604 360.668 -0.5%
1048576000 365.782 349.517 -4.4%

Additional information

Edited by Charles Schlosser

Merge request reports

Loading