Speed up pldexp_generic.
Speeds up pexp by up to 6%.
Measurements on SkylakeX:
SSE4.2:
name old cpu/op new cpu/op delta
BM_eigen_exp_float/1 1.88ns ± 1% 1.68ns ± 1% -10.88% (p=0.000 n=54+47)
BM_eigen_exp_float/8 28.9ns ± 1% 28.5ns ± 0% -1.37% (p=0.000 n=51+47)
BM_eigen_exp_float/64 145ns ± 1% 139ns ± 0% -4.09% (p=0.000 n=49+43)
BM_eigen_exp_float/512 1.11µs ± 1% 1.06µs ± 0% -4.42% (p=0.000 n=42+46)
BM_eigen_exp_float/4k 8.80µs ± 0% 8.40µs ± 0% -4.54% (p=0.000 n=42+42)
BM_eigen_exp_float/32k 70.2µs ± 0% 67.6µs ± 3% -3.74% (p=0.000 n=46+59)
BM_eigen_exp_float/256k 561µs ± 0% 537µs ± 1% -4.27% (p=0.000 n=45+45)
BM_eigen_exp_float/1M 2.24ms ± 0% 2.15ms ± 1% -4.15% (p=0.000 n=39+43)
AVX2:
name old cpu/op new cpu/op delta
BM_eigen_exp_float/1 1.70ns ± 6% 1.70ns ± 5% ~ (p=0.488 n=60+60)
BM_eigen_exp_float/8 30.9ns ± 0% 30.9ns ± 0% ~ (p=0.352 n=49+50)
BM_eigen_exp_float/64 84.1ns ± 4% 81.0ns ± 4% -3.71% (p=0.000 n=59+58)
BM_eigen_exp_float/512 520ns ± 4% 489ns ± 3% -5.96% (p=0.000 n=57+58)
BM_eigen_exp_float/4k 3.99µs ± 4% 3.77µs ± 4% -5.45% (p=0.000 n=48+46)
BM_eigen_exp_float/32k 31.8µs ± 5% 29.9µs ± 5% -5.87% (p=0.000 n=50+53)
BM_eigen_exp_float/256k 253µs ± 4% 239µs ± 4% -5.65% (p=0.000 n=50+53)
BM_eigen_exp_float/1M 1.01ms ± 4% 0.95ms ± 4% -6.04% (p=0.000 n=60+56)
AVX512:
name old cpu/op new cpu/op delta
BM_eigen_exp_float/1 2.64ns ± 1% 2.65ns ± 2% ~ (p=0.061 n=51+54)
BM_eigen_exp_float/8 33.9ns ± 2% 33.9ns ± 2% ~ (p=0.546 n=49+46)
BM_eigen_exp_float/64 88.5ns ± 3% 88.7ns ± 4% ~ (p=0.703 n=57+59)
BM_eigen_exp_float/512 275ns ± 3% 274ns ± 3% -0.60% (p=0.009 n=52+54)
BM_eigen_exp_float/4k 1.77µs ± 3% 1.76µs ± 3% -0.62% (p=0.006 n=59+59)
BM_eigen_exp_float/32k 13.7µs ± 3% 13.7µs ± 4% ~ (p=0.153 n=58+60)
BM_eigen_exp_float/256k 119µs ± 5% 118µs ± 4% ~ (p=0.453 n=60+58)
BM_eigen_exp_float/1M 475µs ± 6% 475µs ± 5% ~ (p=0.723 n=60+60)