Skip to content

Improve exp<float>(): Don't flush denormal results + 4% speedup.

  1. Speed up exp(x) by reducing the polynomial approximant from degree 7 to degree 6. With exactly representable coefficients computed by the Sollya tool, this still gives a maximum relative error of 1 ulp, i.e. faithfully rounded, for arguments where exp(x) is a normalized float. This change results in a speedup of about 4% for AVX2.

  2. Extend the range where exp(x) returns a non-zero result to from ~[-88;88] to ~[-104;88] i.e. return denormalized values for large negative arguments instead of zero. Compared to exp<double>(x) the denormalized results gradually decrease in accuracy down to 0.033 relative error for arguments around x = -104 where exp(x) is ~std::numeric<float>::denorm_min(). This is expected and acceptable.

Benchmark numbers for AVX2.

name                      old cpu/op  new cpu/op  delta
BM_eigen_exp_float/1      3.27ns ± 0%  3.27ns ± 0%    ~     (p=0.218 n=46+48)
BM_eigen_exp_float/8      29.6ns ± 0%  30.1ns ± 6%  +1.56%  (p=0.000 n=41+54)
BM_eigen_exp_float/64     80.4ns ± 5%  79.7ns ± 5%  -0.85%  (p=0.007 n=47+60)
BM_eigen_exp_float/512     460ns ± 2%   441ns ± 2%  -4.31%  (p=0.000 n=60+57)
BM_eigen_exp_float/4k     3.48µs ± 2%  3.35µs ± 2%  -3.52%  (p=0.000 n=49+49)
BM_eigen_exp_float/32k    27.6µs ± 3%  26.6µs ± 3%  -3.75%  (p=0.000 n=54+54)
BM_eigen_exp_float/256k    221µs ± 2%   212µs ± 2%  -3.81%  (p=0.000 n=48+56)
BM_eigen_exp_float/1M      887µs ± 3%   848µs ± 2%  -4.33%  (p=0.000 n=39+54)

name                      old time/op             new time/op             delta
BM_eigen_exp_float/1      3.27ns ± 0%             3.27ns ± 0%    ~           (p=0.475 n=49+48)
BM_eigen_exp_float/8      29.6ns ± 0%             30.1ns ± 6%  +1.54%        (p=0.000 n=41+54)
BM_eigen_exp_float/64     80.4ns ± 5%             79.7ns ± 5%  -0.89%        (p=0.006 n=48+60)
BM_eigen_exp_float/512     460ns ± 2%              441ns ± 2%  -4.31%        (p=0.000 n=60+57)
BM_eigen_exp_float/4k     3.48µs ± 2%             3.35µs ± 2%  -3.52%        (p=0.000 n=49+49)
BM_eigen_exp_float/32k    27.6µs ± 3%             26.6µs ± 3%  -3.73%        (p=0.000 n=54+54)
BM_eigen_exp_float/256k    221µs ± 2%              212µs ± 2%  -3.83%        (p=0.000 n=48+56)
BM_eigen_exp_float/1M      887µs ± 3%              848µs ± 2%  -4.33%        (p=0.000 n=39+54)

name                      old INSTRUCTIONS/op     new INSTRUCTIONS/op     delta
BM_eigen_exp_float/1        41.0 ± 0%               41.0 ± 0%    ~     (all samples are equal)
BM_eigen_exp_float/8         308 ± 0%                308 ± 0%    ~     (all samples are equal)
BM_eigen_exp_float/64        660 ± 0%                632 ± 0%  -4.24%        (p=0.000 n=60+60)
BM_eigen_exp_float/512     3.29k ± 0%              3.04k ± 0%  -7.65%        (p=0.000 n=53+55)
BM_eigen_exp_float/4k      24.3k ± 0%              22.3k ± 0%  -8.39%        (p=0.000 n=45+45)
BM_eigen_exp_float/32k      193k ± 0%               176k ± 0%  -8.50%        (p=0.000 n=49+48)
BM_eigen_exp_float/256k    1.54M ± 0%              1.41M ± 0%  -8.51%        (p=0.000 n=44+54)
BM_eigen_exp_float/1M      6.16M ± 0%              5.64M ± 0%  -8.51%        (p=0.000 n=37+52)

name                      old CYCLES/op           new CYCLES/op           delta
BM_eigen_exp_float/1        12.0 ± 0%               12.0 ± 0%    ~           (p=0.830 n=49+49)
BM_eigen_exp_float/8         109 ± 0%                111 ± 6%  +1.52%        (p=0.000 n=40+54)
BM_eigen_exp_float/64        270 ± 2%                269 ± 5%    ~           (p=0.051 n=47+60)
BM_eigen_exp_float/512     1.55k ± 2%              1.49k ± 2%  -4.10%        (p=0.000 n=57+60)
BM_eigen_exp_float/4k      11.7k ± 1%              11.3k ± 1%  -3.78%        (p=0.000 n=50+40)
BM_eigen_exp_float/32k     93.0k ± 1%              89.5k ± 1%  -3.76%        (p=0.000 n=52+48)
BM_eigen_exp_float/256k     744k ± 1%               715k ± 1%  -3.93%        (p=0.000 n=49+46)
BM_eigen_exp_float/1M      2.99M ± 1%              2.87M ± 1%  -4.02%        (p=0.000 n=40+58)
Edited by Rasmus Munk Larsen

Merge request reports

Loading