Use pmsub in twoprod. This speeds up pow() on Skylake by ~1%.
Benchmark numbers with -march=skylake:
name old cpu/op new cpu/op delta
BM_eigen_powquarter_float/1 3.82ns ± 1% 3.82ns ± 1% ~ (p=0.330 n=46+43)
BM_eigen_powquarter_float/8 95.0ns ± 0% 95.0ns ± 0% ~ (p=0.779 n=51+56)
BM_eigen_powquarter_float/64 1.03µs ± 3% 1.02µs ± 2% -0.85% (p=0.000 n=51+56)
BM_eigen_powquarter_float/512 8.48µs ± 2% 8.43µs ± 3% -0.59% (p=0.002 n=52+57)
BM_eigen_powquarter_float/4k 68.4µs ± 3% 67.5µs ± 3% -1.27% (p=0.000 n=54+53)
BM_eigen_powquarter_float/32k 546µs ± 4% 541µs ± 3% -0.93% (p=0.007 n=57+56)
BM_eigen_powquarter_float/256k 4.38ms ± 5% 4.33ms ± 3% -0.99% (p=0.001 n=56+55)
BM_eigen_powquarter_float/1M 17.4ms ± 3% 17.3ms ± 3% -0.51% (p=0.032 n=53+54)