Prevent premature overflow to infinity in exp(x). The changes also provide a 3-4% speedup.
Benchmark measurements for AVX2+FMA:
name old cpu/op new cpu/op delta
BM_eigen_exp_double/8 52.2ns ± 1% 51.8ns ± 2% -0.88% (p=0.000 n=50+49)
BM_eigen_exp_double/64 166ns ± 4% 159ns ± 4% -4.05% (p=0.000 n=58+58)
BM_eigen_exp_double/512 1.04µs ± 4% 0.99µs ± 6% -4.34% (p=0.000 n=58+60)
BM_eigen_exp_double/4k 7.98µs ± 3% 7.72µs ± 6% -3.33% (p=0.000 n=57+59)
BM_eigen_exp_double/32k 63.9µs ± 3% 61.1µs ± 6% -4.47% (p=0.000 n=56+58)
BM_eigen_exp_double/256k 510µs ± 2% 488µs ± 5% -4.23% (p=0.000 n=57+55)
BM_eigen_exp_double/1M 2.06ms ± 4% 1.96ms ± 7% -4.66% (p=0.000 n=51+54)
BM_eigen_exp_float/8 28.6ns ± 1% 28.3ns ± 3% -0.84% (p=0.000 n=46+42)
BM_eigen_exp_float/64 64.1ns ± 3% 63.9ns ±11% -0.36% (p=0.018 n=57+53)
BM_eigen_exp_float/512 330ns ± 3% 317ns ± 4% -3.73% (p=0.000 n=50+48)
BM_eigen_exp_float/4k 2.47µs ± 4% 2.38µs ± 4% -3.54% (p=0.000 n=54+48)
BM_eigen_exp_float/32k 19.5µs ± 4% 18.8µs ± 6% -3.39% (p=0.000 n=59+56)
BM_eigen_exp_float/256k 155µs ± 2% 150µs ± 6% -3.36% (p=0.000 n=56+57)
BM_eigen_exp_float/1M 623µs ± 3% 601µs ± 6% -3.57% (p=0.000 n=58+58)