Skip to content

Prevent premature overflow to infinity in exp(x). The changes also provide a 3-4% speedup.

Benchmark measurements for AVX2+FMA:

name                       old cpu/op   new cpu/op   delta
BM_eigen_exp_double/8      52.2ns ± 1%  51.8ns ± 2%   -0.88%  (p=0.000 n=50+49)
BM_eigen_exp_double/64      166ns ± 4%   159ns ± 4%   -4.05%  (p=0.000 n=58+58)
BM_eigen_exp_double/512    1.04µs ± 4%  0.99µs ± 6%   -4.34%  (p=0.000 n=58+60)
BM_eigen_exp_double/4k     7.98µs ± 3%  7.72µs ± 6%   -3.33%  (p=0.000 n=57+59)
BM_eigen_exp_double/32k    63.9µs ± 3%  61.1µs ± 6%   -4.47%  (p=0.000 n=56+58)
BM_eigen_exp_double/256k    510µs ± 2%   488µs ± 5%   -4.23%  (p=0.000 n=57+55)
BM_eigen_exp_double/1M     2.06ms ± 4%  1.96ms ± 7%   -4.66%  (p=0.000 n=51+54)
BM_eigen_exp_float/8       28.6ns ± 1%  28.3ns ± 3%   -0.84%  (p=0.000 n=46+42)
BM_eigen_exp_float/64      64.1ns ± 3%  63.9ns ±11%   -0.36%  (p=0.018 n=57+53)
BM_eigen_exp_float/512      330ns ± 3%   317ns ± 4%   -3.73%  (p=0.000 n=50+48)
BM_eigen_exp_float/4k      2.47µs ± 4%  2.38µs ± 4%   -3.54%  (p=0.000 n=54+48)
BM_eigen_exp_float/32k     19.5µs ± 4%  18.8µs ± 6%   -3.39%  (p=0.000 n=59+56)
BM_eigen_exp_float/256k     155µs ± 2%   150µs ± 6%   -3.36%  (p=0.000 n=56+57)
BM_eigen_exp_float/1M       623µs ± 3%   601µs ± 6%   -3.57%  (p=0.000 n=58+58)

Merge request reports

Loading