Optimize slerp() as proposed by Gopinath Vasalamarri.
Replaces !1881 (merged)
I want to add a short note to explain why this optimization is not a significant degradation in accuracy, since this question came up in a related context.
In short, using float as an example, this MR replaces
float cos_x = ...
t = arccos(x);
sin_x = sin(t);
with
float cos_x = ...
sin_x = sqrt(1.0f - cos_x * cos_x);
One might be concerned that the latter formula is less accurate when cos_x is near 1, i.e. when cos_x = cos(theta) for a tiny angle theta near zero. However, the accuracy in the original computation is inherently limited by the representation error of cos(x) near 1, which is 2^-24 or ~5e-8. The following code snippet plots the relative error of the two formulas computed in float, compared to the original formula computed in double precision:
https://godbolt.org/z/vd1jM16s1
A subset of the table for cos(x) near 1.0:
Argument Relative error
cos(x) sin(acos(x)) sqrt(1-x^2)
0.9999990 -2.703e-08 2.183e-07
0.9900980 -5.809e-08 4.806e-08
0.9802951 1.467e-08 2.410e-07
0.9705892 1.515e-09 1.253e-07
0.9609793 2.488e-08 2.488e-08
0.9514647 -1.930e-08 -1.161e-07
0.9420443 -1.479e-08 1.629e-07
0.9327171 -5.311e-08 2.954e-08
0.9234822 1.286e-08 1.286e-08
0.9143388 -3.796e-08 1.092e-07
0.9052860 2.980e-08 2.980e-08
0.8963227 -1.029e-08 -1.029e-08
0.8874483 5.239e-09 5.239e-09
0.8786616 -3.066e-09 -3.066e-09
...
As can be seen from the table, the relative accuracy of sin(t) computed from the fast formula is still accurate to a few ULPs near 1.