Simpler range reduction strategy for atan<float>().

Reference issue

What does this implement/fix?

Additional information

This change saves a division and some pselect logic, in exchange for a couple of extra FMAs. The relative error is still <= 2 ulps, while speedup is 20-40% on x86. $2421160

Unfortunately, the same change is not viable for double without going to a very high polynomial degree, negating the benefit.

Also, this change refactors the inner polynomial approximations for atan<float>() and atan<double>() to separate functions for future use in a more efficient implementation of atan2().

Edited by Rasmus Munk Larsen

Merge request reports

Loading