Vectorize atanh & add a missing definition and unit test for atan.
This change adds
- A vectorized implementation of
atanh. - A missing definition for
atan<half>. - Unit tests for patan.
- A new helper function for testing unary functors on special IEEE values. Tests are added for most common mathematical functions.
The vectorized function is accurate to 2 ULP.
Benchmark numbers:
SSE:
name old cpu/op new cpu/op delta
BM_eigen_atanh_float/1 0.29ns ± 1% 2.45ns ± 0% +748.86% (p=0.000 n=53+46)
BM_eigen_atanh_float/8 36.4ns ± 0% 34.3ns ± 2% -5.92% (p=0.000 n=43+38)
BM_eigen_atanh_float/64 334ns ± 1% 205ns ± 4% -38.61% (p=0.000 n=44+60)
BM_eigen_atanh_float/512 2.65µs ± 1% 1.60µs ± 1% -39.82% (p=0.000 n=51+38)
BM_eigen_atanh_float/4k 21.2µs ± 0% 12.7µs ± 4% -40.00% (p=0.000 n=56+59)
BM_eigen_atanh_float/32k 169µs ± 0% 102µs ± 1% -39.80% (p=0.000 n=51+43)
BM_eigen_atanh_float/256k 1.36ms ± 0% 0.82ms ± 1% -39.84% (p=0.000 n=56+48)
BM_eigen_atanh_float/1M 5.43ms ± 0% 3.26ms ± 2% -39.87% (p=0.000 n=55+43)
AVX2:
name old cpu/op new cpu/op delta
BM_eigen_atanh_float/1 2.12ns ± 1% 1.91ns ± 0% -9.68% (p=0.000 n=54+55)
BM_eigen_atanh_float/8 38.1ns ± 0% 39.4ns ± 1% +3.55% (p=0.000 n=45+43)
BM_eigen_atanh_float/64 334ns ± 3% 137ns ± 4% -59.10% (p=0.000 n=50+59)
BM_eigen_atanh_float/512 2.66µs ± 2% 0.84µs ± 5% -68.51% (p=0.000 n=51+57)
BM_eigen_atanh_float/4k 21.3µs ± 1% 6.5µs ± 5% -69.53% (p=0.000 n=58+59)
BM_eigen_atanh_float/32k 170µs ± 1% 51µs ± 4% -69.90% (p=0.000 n=57+53)
BM_eigen_atanh_float/256k 1.36ms ± 1% 0.41ms ± 4% -69.81% (p=0.000 n=58+47)
BM_eigen_atanh_float/1M 5.46ms ± 1% 1.65ms ± 5% -69.71% (p=0.000 n=60+59)
AVX512:
name old cpu/op new cpu/op delta
BM_eigen_atanh_float/1 2.18ns ± 1% 3.28ns ± 1% +49.98% (p=0.000 n=56+47)
BM_eigen_atanh_float/8 38.1ns ± 0% 41.4ns ± 1% +8.87% (p=0.000 n=47+44)
BM_eigen_atanh_float/64 332ns ± 1% 148ns ± 3% -55.49% (p=0.000 n=43+54)
BM_eigen_atanh_float/512 2.66µs ± 1% 0.47µs ± 5% -82.21% (p=0.000 n=52+56)
BM_eigen_atanh_float/4k 21.3µs ± 1% 3.1µs ± 3% -85.39% (p=0.000 n=54+50)
BM_eigen_atanh_float/32k 170µs ± 0% 24µs ± 3% -85.86% (p=0.000 n=54+55)
BM_eigen_atanh_float/256k 1.36ms ± 1% 0.19ms ± 3% -85.90% (p=0.000 n=56+60)
BM_eigen_atanh_float/1M 5.46ms ± 1% 0.76ms ± 4% -85.98% (p=0.000 n=57+58)
Edited by Rasmus Munk Larsen