Vectorize tan(x)

This implementation has a maximum error of 4 ULP for AVX2+FMA.

Benchmark measurements:

name                      cpu/op         cpu/op      vs base                
BM_eigen_tan_float/1       6.759n ± 0%   10.999n ± 1%  +62.73% (p=0.000 n=72)
BM_eigen_tan_float/8       44.14n ± 0%    10.67n ± 1%  -75.84% (n=72)
BM_eigen_tan_float/64     350.33n ± 0%    59.72n ± 2%  -82.95% (n=60+72)
BM_eigen_tan_float/512    2761.0n ± 0%    436.4n ± 1%  -84.20% (n=66+72)
BM_eigen_tan_float/4k     22.136µ ± 0%    3.472µ ± 1%  -84.32% (n=71+60)
BM_eigen_tan_float/32k    176.69µ ± 0%    27.56µ ± 1%  -84.41% (n=72+65)
BM_eigen_tan_float/256k   1413.5µ ± 0%    221.5µ ± 2%  -84.33% (n=72+70)
BM_eigen_tan_float/1M     5653.5µ ± 0%    877.6µ ± 2%  -84.48% (n=72)
geomean                   7.403µ         1.657µ       -77.62%

name                     cpu/op        cpu/op      vs base                
BM_eigen_tan_double/1     18.18n ± 0%   19.84n ± 0%   +9.11% (p=0.000 n=72)
BM_eigen_tan_double/8    137.76n ± 0%   36.19n ± 1%  -73.73% (n=72+59)
BM_eigen_tan_double/64   1100.6n ± 0%   262.1n ± 1%  -76.19% (n=72+66)
BM_eigen_tan_double/512   8.769µ ± 0%   2.039µ ± 1%  -76.74% (n=72)
BM_eigen_tan_double/4k    70.10µ ± 0%   16.39µ ± 1%  -76.62% (n=72)
BM_eigen_tan_double/32k   560.8µ ± 0%   130.8µ ± 1%  -76.67% (n=72)
BM_eigen_tan_double/256k  4.487m ± 0%   1.045m ± 1%  -76.70% (n=72)
BM_eigen_tan_double/1M   17.970m ± 0%   4.212m ± 1%  -76.56% (n=70+69)
geomean                  22.94µ        6.605µ       -71.21%
Edited by Rasmus Munk Larsen

Merge request reports

Loading