Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%.

Benchmark numbers with -march=skylake:

name                            old cpu/op  new cpu/op  delta
BM_eigen_powquarter_float/1     3.82ns ± 1%  3.82ns ± 1%    ~     (p=0.330 n=46+43)
BM_eigen_powquarter_float/8     95.0ns ± 0%  95.0ns ± 0%    ~     (p=0.779 n=51+56)
BM_eigen_powquarter_float/64    1.03µs ± 3%  1.02µs ± 2%  -0.85%  (p=0.000 n=51+56)
BM_eigen_powquarter_float/512   8.48µs ± 2%  8.43µs ± 3%  -0.59%  (p=0.002 n=52+57)
BM_eigen_powquarter_float/4k    68.4µs ± 3%  67.5µs ± 3%  -1.27%  (p=0.000 n=54+53)
BM_eigen_powquarter_float/32k    546µs ± 4%   541µs ± 3%  -0.93%  (p=0.007 n=57+56)
BM_eigen_powquarter_float/256k  4.38ms ± 5%  4.33ms ± 3%  -0.99%  (p=0.001 n=56+55)
BM_eigen_powquarter_float/1M    17.4ms ± 3%  17.3ms ± 3%  -0.51%  (p=0.032 n=53+54)

Merge request reports

Loading