Vectorize atanh<double>. Make atanh(x) standard compliant for |x| >= 1.
This implements a vectorized version of atanh<double>. This MR also fixes handling of arguments |x|>=1 to be standard compliant, which does slow down the existing implementation for float by 10-15%.
Speedups are as follows:
| ISA | Speedup (double) | Speedup (float) |
|---|---|---|
| SSE 4.2 | 1.8x | -15% |
| AVX2 + FMA | 2.0x | -10% |
| AVX512 | 3.4x | -10% |
Detailed measurements: $3742833
Edited by Rasmus Munk Larsen