Skip to content

Consolidate float and double implementations of patan().

The new implementations use the same range reduction to [-1:1] and only use separate rational approximations for x in [-1:1].

Results differ less than 3 ULPs from std::atan.

This gives a speedup for some combinations of type and ISA.

ISA Type Speedup
SSE 4.2 float 30%
AVX2+FMA float 25%
AVX512 float 0
SSE 4.2 double 3.5%
AVX2+FMA double 20%
AVX512 double -3%

Full benchmark results are here.

Edited by Rasmus Munk Larsen

Merge request reports

Loading