Improve exp<float>(): Don't flush denormal results + 4% speedup.
-
Speed up
exp(x)
by reducing the polynomial approximant from degree 7 to degree 6. With exactly representable coefficients computed by the Sollya tool, this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for arguments where exp(x) is a normalized float. This change results in a speedup of about 4% for AVX2. -
Extend the range where
exp(x)
returns a non-zero result to from ~[-88;88] to ~[-104;88] i.e. return denormalized values for large negative arguments instead of zero. Compared toexp<double>(x)
the denormalized results gradually decrease in accuracy down to 0.033 relative error for arguments aroundx = -104
whereexp(x)
is~std::numeric<float>::denorm_min()
. This is expected and acceptable.
Benchmark numbers for AVX2.
name old cpu/op new cpu/op delta
BM_eigen_exp_float/1 3.27ns ± 0% 3.27ns ± 0% ~ (p=0.218 n=46+48)
BM_eigen_exp_float/8 29.6ns ± 0% 30.1ns ± 6% +1.56% (p=0.000 n=41+54)
BM_eigen_exp_float/64 80.4ns ± 5% 79.7ns ± 5% -0.85% (p=0.007 n=47+60)
BM_eigen_exp_float/512 460ns ± 2% 441ns ± 2% -4.31% (p=0.000 n=60+57)
BM_eigen_exp_float/4k 3.48µs ± 2% 3.35µs ± 2% -3.52% (p=0.000 n=49+49)
BM_eigen_exp_float/32k 27.6µs ± 3% 26.6µs ± 3% -3.75% (p=0.000 n=54+54)
BM_eigen_exp_float/256k 221µs ± 2% 212µs ± 2% -3.81% (p=0.000 n=48+56)
BM_eigen_exp_float/1M 887µs ± 3% 848µs ± 2% -4.33% (p=0.000 n=39+54)
name old time/op new time/op delta
BM_eigen_exp_float/1 3.27ns ± 0% 3.27ns ± 0% ~ (p=0.475 n=49+48)
BM_eigen_exp_float/8 29.6ns ± 0% 30.1ns ± 6% +1.54% (p=0.000 n=41+54)
BM_eigen_exp_float/64 80.4ns ± 5% 79.7ns ± 5% -0.89% (p=0.006 n=48+60)
BM_eigen_exp_float/512 460ns ± 2% 441ns ± 2% -4.31% (p=0.000 n=60+57)
BM_eigen_exp_float/4k 3.48µs ± 2% 3.35µs ± 2% -3.52% (p=0.000 n=49+49)
BM_eigen_exp_float/32k 27.6µs ± 3% 26.6µs ± 3% -3.73% (p=0.000 n=54+54)
BM_eigen_exp_float/256k 221µs ± 2% 212µs ± 2% -3.83% (p=0.000 n=48+56)
BM_eigen_exp_float/1M 887µs ± 3% 848µs ± 2% -4.33% (p=0.000 n=39+54)
name old INSTRUCTIONS/op new INSTRUCTIONS/op delta
BM_eigen_exp_float/1 41.0 ± 0% 41.0 ± 0% ~ (all samples are equal)
BM_eigen_exp_float/8 308 ± 0% 308 ± 0% ~ (all samples are equal)
BM_eigen_exp_float/64 660 ± 0% 632 ± 0% -4.24% (p=0.000 n=60+60)
BM_eigen_exp_float/512 3.29k ± 0% 3.04k ± 0% -7.65% (p=0.000 n=53+55)
BM_eigen_exp_float/4k 24.3k ± 0% 22.3k ± 0% -8.39% (p=0.000 n=45+45)
BM_eigen_exp_float/32k 193k ± 0% 176k ± 0% -8.50% (p=0.000 n=49+48)
BM_eigen_exp_float/256k 1.54M ± 0% 1.41M ± 0% -8.51% (p=0.000 n=44+54)
BM_eigen_exp_float/1M 6.16M ± 0% 5.64M ± 0% -8.51% (p=0.000 n=37+52)
name old CYCLES/op new CYCLES/op delta
BM_eigen_exp_float/1 12.0 ± 0% 12.0 ± 0% ~ (p=0.830 n=49+49)
BM_eigen_exp_float/8 109 ± 0% 111 ± 6% +1.52% (p=0.000 n=40+54)
BM_eigen_exp_float/64 270 ± 2% 269 ± 5% ~ (p=0.051 n=47+60)
BM_eigen_exp_float/512 1.55k ± 2% 1.49k ± 2% -4.10% (p=0.000 n=57+60)
BM_eigen_exp_float/4k 11.7k ± 1% 11.3k ± 1% -3.78% (p=0.000 n=50+40)
BM_eigen_exp_float/32k 93.0k ± 1% 89.5k ± 1% -3.76% (p=0.000 n=52+48)
BM_eigen_exp_float/256k 744k ± 1% 715k ± 1% -3.93% (p=0.000 n=49+46)
BM_eigen_exp_float/1M 2.99M ± 1% 2.87M ± 1% -4.02% (p=0.000 n=40+58)