Small cleanup and optimization of generic plog implementations:

Background: The algorithm splits the input in fraction and exponent parts x = m * 2^e and computes log(x) as log(m) + e*ln(2). Adding the term e*ln(2) is split into two steps by splitting the value into c1*e*ln(2) and c2*e*ln(2), where c1 + c2 = 1. The two terms are added separately to the term from the logarithm of the fractional part of the input. This code dates back to the original Cephes code, from which the algorithm is adapted, and appears to be done to sometimes reduce truncation errors and to prevent the compiler from reordering the addition of the 3 terms in the approximation

log(m) = log(1+x) ~= x - 0.5*x^2 + x^3*P(x)/Q(x)

which must be added in reverse order since |x| <= (sqrt(2)-1).

Replacing the two-step process code with a single FMA speeds up the code by 5-7% on a Skylake processors. It preserves the maximum relative error bound of 1 ULP, while slightly increasing the RMS relative error. See details below.

This change also cleans up the docstring for plog_double, which mentions a second series expansion from Cephes that is not actually used in Eigen.

FWIW: This code originates from the original commit of vectorized transcendental functions in Eigen in 17860e57

Below are error statistics comparing plog to std::log(double(x)) for all normal float:

Before this change (two step addition):

max_abs_error = 7.62939e-06
max_rel_error = 1.19209e-07
rms_rel_error = 7.27859e-09

The max_abs_error and max_rel_error is unchanged, but the RMS relative error various slightly with different strategies:

Two step addition    Two step addition with FMA   Single step addition with FMA        Single step addition without FMA
6.63155e-09          6.57811e-09                  1.05781e-08                          1.05781e-08

The same statistics for double:

max_abs_error = 1.42109e-14
max_rel_error = 2.22045e-16
rms_rel_error = 1.20733e-17

RMS relative error:

Two step addition    Two step addition with FMA   Single step addition with FMA        Single step addition without FMA
1.2341e-17           1.22201e-17                  7.38796e-17                          8.82386e-17

BTW: The log approximation is pretty poor for denormal floating point numbers. If I include denormals in the sweep for float I get:

max_abs_error = 15.9424
max_rel_error = 0.154362
rms_rel_error = 0.000980872
Edited by Rasmus Munk Larsen

Merge request reports

Loading