Skip to content

Make sure exp(-Inf) is zero for vectorized expressions.

Reference issue

This fixes #2385 (closed)

What does this implement/fix?

Before this fix exp() would return a non-zero value for -Inf arguments if the expression was vectorized (i.e. the array being at least as long as the packet size of the corresponding scalar type).

Additional information

For AVX2 this change gives a small speedup for float and is neutral for double.

AVX2 on Skylake:

name                      old cpu/op  new cpu/op  delta
BM_eigen_exp_double/1      3.54ns ± 0%  3.54ns ± 0%  -0.09%  (p=0.005 n=50+49)
BM_eigen_exp_double/8      58.8ns ± 1%  59.0ns ± 3%    ~     (p=0.385 n=43+56)
BM_eigen_exp_double/64      201ns ± 4%   200ns ± 4%    ~     (p=0.299 n=59+60)
BM_eigen_exp_double/512    1.29µs ± 2%  1.28µs ± 3%  -0.73%  (p=0.001 n=59+59)
BM_eigen_exp_double/4k     9.92µs ± 2%  9.90µs ± 3%    ~     (p=0.435 n=59+59)
BM_eigen_exp_double/32k    78.8µs ± 2%  78.9µs ± 3%    ~     (p=0.584 n=58+59)
BM_eigen_exp_double/256k    634µs ± 2%   628µs ± 3%  -0.96%  (p=0.000 n=59+58)
BM_eigen_exp_double/1M     2.54ms ± 2%  2.51ms ± 2%  -1.24%  (p=0.000 n=34+33)
BM_eigen_exp_float/1       3.27ns ± 0%  3.27ns ± 0%  -0.10%  (p=0.000 n=50+47)
BM_eigen_exp_float/8       30.3ns ± 5%  29.6ns ± 0%  -2.34%  (p=0.001 n=54+50)
BM_eigen_exp_float/64      81.3ns ± 2%  79.6ns ± 2%  -2.11%  (p=0.000 n=58+58)
BM_eigen_exp_float/512      471ns ± 4%   455ns ± 3%  -3.40%  (p=0.000 n=60+58)
BM_eigen_exp_float/4k      3.58µs ± 3%  3.45µs ± 3%  -3.53%  (p=0.000 n=50+49)
BM_eigen_exp_float/32k     28.5µs ± 3%  27.5µs ± 3%  -3.52%  (p=0.000 n=54+52)
BM_eigen_exp_float/256k     227µs ± 4%   220µs ± 3%  -3.27%  (p=0.000 n=49+49)
BM_eigen_exp_float/1M       908µs ± 4%   884µs ± 2%  -2.65%  (p=0.000 n=42+43)

For SSE, the change nets a 4-6% speedup:

name                      old cpu/op  new cpu/op  delta
BM_eigen_exp_double/1     1.90ns ± 0%  1.90ns ± 1%     ~     (p=0.567 n=48+60)
BM_eigen_exp_double/8     48.2ns ± 0%  45.9ns ± 0%   -4.76%  (p=0.000 n=49+51)
BM_eigen_exp_double/64     348ns ± 2%   328ns ± 2%   -5.94%  (p=0.000 n=50+49)
BM_eigen_exp_double/512   2.74µs ± 0%  2.56µs ± 0%   -6.61%  (p=0.000 n=44+53)
BM_eigen_exp_double/4k    21.9µs ± 0%  20.5µs ± 0%   -6.41%  (p=0.000 n=58+50)
BM_eigen_exp_double/32k    175µs ± 0%   163µs ± 0%   -6.52%  (p=0.000 n=52+50)
BM_eigen_exp_double/256k  1.40ms ± 0%  1.31ms ± 0%   -6.45%  (p=0.000 n=54+51)
BM_eigen_exp_double/1M    5.59ms ± 0%  5.23ms ± 0%   -6.41%  (p=0.000 n=43+43)
BM_eigen_exp_float/1      1.87ns ± 2%  1.89ns ± 0%   +1.06%  (p=0.000 n=60+53)
BM_eigen_exp_float/8      22.5ns ± 0%  25.3ns ± 0%  +12.65%  (p=0.000 n=54+48)
BM_eigen_exp_float/64      149ns ± 0%   142ns ± 0%   -4.84%  (p=0.000 n=59+50)
BM_eigen_exp_float/512    1.17µs ± 0%  1.11µs ± 0%   -5.07%  (p=0.000 n=54+52)
BM_eigen_exp_float/4k     9.36µs ± 0%  8.87µs ± 0%   -5.21%  (p=0.000 n=52+55)
BM_eigen_exp_float/32k    74.9µs ± 0%  70.9µs ± 0%   -5.41%  (p=0.000 n=54+53)
BM_eigen_exp_float/256k    599µs ± 0%   569µs ± 0%   -5.11%  (p=0.000 n=58+53)
BM_eigen_exp_float/1M     2.39ms ± 0%  2.27ms ± 0%   -4.92%  (p=0.000 n=33+32)
Edited by Rasmus Munk Larsen

Merge request reports

Loading