Vectorize atanh<double>. Make atanh(x) standard compliant for |x| >= 1.

This implements a vectorized version of atanh<double>. This MR also fixes handling of arguments |x|>=1 to be standard compliant, which does slow down the existing implementation for float by 10-15%.

Speedups are as follows:

ISA Speedup (double) Speedup (float)
SSE 4.2 1.8x -15%
AVX2 + FMA 2.0x -10%
AVX512 3.4x -10%

Detailed measurements: $3742833

Edited by Rasmus Munk Larsen

Merge request reports

Loading