Make sure exp(-Inf) is zero for vectorized expressions.
Reference issue
This fixes #2385 (closed)
What does this implement/fix?
Before this fix exp() would return a non-zero value for -Inf arguments if the expression was vectorized (i.e. the array being at least as long as the packet size of the corresponding scalar type).
Additional information
For AVX2 this change gives a small speedup for float and is neutral for double.
AVX2 on Skylake:
name old cpu/op new cpu/op delta
BM_eigen_exp_double/1 3.54ns ± 0% 3.54ns ± 0% -0.09% (p=0.005 n=50+49)
BM_eigen_exp_double/8 58.8ns ± 1% 59.0ns ± 3% ~ (p=0.385 n=43+56)
BM_eigen_exp_double/64 201ns ± 4% 200ns ± 4% ~ (p=0.299 n=59+60)
BM_eigen_exp_double/512 1.29µs ± 2% 1.28µs ± 3% -0.73% (p=0.001 n=59+59)
BM_eigen_exp_double/4k 9.92µs ± 2% 9.90µs ± 3% ~ (p=0.435 n=59+59)
BM_eigen_exp_double/32k 78.8µs ± 2% 78.9µs ± 3% ~ (p=0.584 n=58+59)
BM_eigen_exp_double/256k 634µs ± 2% 628µs ± 3% -0.96% (p=0.000 n=59+58)
BM_eigen_exp_double/1M 2.54ms ± 2% 2.51ms ± 2% -1.24% (p=0.000 n=34+33)
BM_eigen_exp_float/1 3.27ns ± 0% 3.27ns ± 0% -0.10% (p=0.000 n=50+47)
BM_eigen_exp_float/8 30.3ns ± 5% 29.6ns ± 0% -2.34% (p=0.001 n=54+50)
BM_eigen_exp_float/64 81.3ns ± 2% 79.6ns ± 2% -2.11% (p=0.000 n=58+58)
BM_eigen_exp_float/512 471ns ± 4% 455ns ± 3% -3.40% (p=0.000 n=60+58)
BM_eigen_exp_float/4k 3.58µs ± 3% 3.45µs ± 3% -3.53% (p=0.000 n=50+49)
BM_eigen_exp_float/32k 28.5µs ± 3% 27.5µs ± 3% -3.52% (p=0.000 n=54+52)
BM_eigen_exp_float/256k 227µs ± 4% 220µs ± 3% -3.27% (p=0.000 n=49+49)
BM_eigen_exp_float/1M 908µs ± 4% 884µs ± 2% -2.65% (p=0.000 n=42+43)
For SSE, the change nets a 4-6% speedup:
name old cpu/op new cpu/op delta
BM_eigen_exp_double/1 1.90ns ± 0% 1.90ns ± 1% ~ (p=0.567 n=48+60)
BM_eigen_exp_double/8 48.2ns ± 0% 45.9ns ± 0% -4.76% (p=0.000 n=49+51)
BM_eigen_exp_double/64 348ns ± 2% 328ns ± 2% -5.94% (p=0.000 n=50+49)
BM_eigen_exp_double/512 2.74µs ± 0% 2.56µs ± 0% -6.61% (p=0.000 n=44+53)
BM_eigen_exp_double/4k 21.9µs ± 0% 20.5µs ± 0% -6.41% (p=0.000 n=58+50)
BM_eigen_exp_double/32k 175µs ± 0% 163µs ± 0% -6.52% (p=0.000 n=52+50)
BM_eigen_exp_double/256k 1.40ms ± 0% 1.31ms ± 0% -6.45% (p=0.000 n=54+51)
BM_eigen_exp_double/1M 5.59ms ± 0% 5.23ms ± 0% -6.41% (p=0.000 n=43+43)
BM_eigen_exp_float/1 1.87ns ± 2% 1.89ns ± 0% +1.06% (p=0.000 n=60+53)
BM_eigen_exp_float/8 22.5ns ± 0% 25.3ns ± 0% +12.65% (p=0.000 n=54+48)
BM_eigen_exp_float/64 149ns ± 0% 142ns ± 0% -4.84% (p=0.000 n=59+50)
BM_eigen_exp_float/512 1.17µs ± 0% 1.11µs ± 0% -5.07% (p=0.000 n=54+52)
BM_eigen_exp_float/4k 9.36µs ± 0% 8.87µs ± 0% -5.21% (p=0.000 n=52+55)
BM_eigen_exp_float/32k 74.9µs ± 0% 70.9µs ± 0% -5.41% (p=0.000 n=54+53)
BM_eigen_exp_float/256k 599µs ± 0% 569µs ± 0% -5.11% (p=0.000 n=58+53)
BM_eigen_exp_float/1M 2.39ms ± 0% 2.27ms ± 0% -4.92% (p=0.000 n=33+32)
Edited by Rasmus Munk Larsen