Eigen does not vectorize pow
Scalar std::pow is quite slow, but we have the necessary building blocks to build a vectorized version. I am working on a merge request along the following lines:
WIP: Vectorize pow(x, y)
in Eigen.
Without loss of generality, consider the case for x>0. For x < 0, pow
is only defined for integer powers y and will be special cased in the implementation. The description below also assumes that pfrexp
"does the right thing" for denormals, which is not currently the case - it always adds 0.5 to the fractional part, which is incorrect for denormals. (Fixing this issue if frexp
should significantly improve the accuracy of log
for subnormal numbers, I think).
Analytically, x^y = exp(y*log(x))
. The approach described below tries to improve accuracy of the naive implementation by working in base 2 and keeping intermediate entities in split form, while reusing existing parts of implementations of pexp
, plog2
, pfrexp
, and pldexp
.
First, let's write the inputs in the split form
y = n_y + r_y. (1)
x = 2^e_x * m_x
<=>
log2(x) = e_x + log2(m_x)
= e_x + r_x. (2)
where e_x and n_y are integer and r_x and r_y are fractional parts, r_x is in [-1, 0), and r_y is in [-0.5, 0.5). Now we can write the desired result as
x^y = exp(y*log(x))
= 2^(y*log2(x)) (3)
By substituting (1) and (2) we get the following expression for the exponent in (3):
y * log2(x) = (n_y + r_y) * (e_x + r_x)
= n_y * e_x + f
= n_z + r_z
where
f = (n_y + r_y) * r_x + r_y * e_x
= y * r_x + r_y * e_x,
n_z := n_y * e_x + round(f)
r_z := f - round(f), r_z will be in [-0.5, 0.5).
Using these quantities we can now compute the rhs of (3) as
x^y = 2^n_z * 2^r_z
= 2^n_z * e^{ln(2) * r_z}
which in code looks like
x^y = ldexp(exp(M_LN2 * r_z), n_z)
All operations here are exact, except computing r_x = log2(m_x), f, and exp(M_LN2 * r_z).
The is some risk of cancellation in computing f, which I'm pondering how to avoid.
Maybe it's sufficient to split y * r_x
and r_y * e_x
into integer and fractional parts separately to get the integer and fractional parts of f.