Skip to content

Eigen does not vectorize pow

Scalar std::pow is quite slow, but we have the necessary building blocks to build a vectorized version. I am working on a merge request along the following lines:

WIP: Vectorize pow(x, y) in Eigen.

Without loss of generality, consider the case for x>0. For x < 0, pow is only defined for integer powers y and will be special cased in the implementation. The description below also assumes that pfrexp "does the right thing" for denormals, which is not currently the case - it always adds 0.5 to the fractional part, which is incorrect for denormals. (Fixing this issue if frexp should significantly improve the accuracy of log for subnormal numbers, I think).

Analytically, x^y = exp(y*log(x)). The approach described below tries to improve accuracy of the naive implementation by working in base 2 and keeping intermediate entities in split form, while reusing existing parts of implementations of pexp, plog2, pfrexp, and pldexp.

First, let's write the inputs in the split form

   y = n_y + r_y.                        (1)

   x = 2^e_x * m_x 
   <=> 
   log2(x) = e_x + log2(m_x) 
           = e_x + r_x.                  (2)

where e_x and n_y are integer and r_x and r_y are fractional parts, r_x is in [-1, 0), and r_y is in [-0.5, 0.5). Now we can write the desired result as

   x^y = exp(y*log(x)) 
       = 2^(y*log2(x))   (3)

By substituting (1) and (2) we get the following expression for the exponent in (3):

   y * log2(x) = (n_y + r_y) * (e_x + r_x)
               = n_y * e_x + f
               = n_z + r_z

where

   f = (n_y + r_y) * r_x + r_y * e_x 
     = y * r_x + r_y * e_x,
   n_z := n_y * e_x + round(f)
   r_z := f - round(f), r_z will be in [-0.5, 0.5).

Using these quantities we can now compute the rhs of (3) as

   x^y = 2^n_z * 2^r_z 
       = 2^n_z * e^{ln(2) * r_z}

which in code looks like

     x^y = ldexp(exp(M_LN2 * r_z), n_z)

All operations here are exact, except computing r_x = log2(m_x), f, and exp(M_LN2 * r_z).

The is some risk of cancellation in computing f, which I'm pondering how to avoid. Maybe it's sufficient to split y * r_x and r_y * e_x into integer and fractional parts separately to get the integer and fractional parts of f.

Edited by Rasmus Munk Larsen