Skip to content

Fix accuracy of logistic sigmoid

Fix accuracy of the specialized float32 implementation of the logistic (sigmoid) function S(x) = exp(x)/(1+exp(x)) in Eigen. The reason to have this specialization in the first place is that logistic sigmoid is a frequently used function in machine learning and statistics, so even a modestly (~30%) faster version can have an impact in applications.

The old implementation was very inaccurate around x=-9 where we would switch from a fast rational approximant to returning exp(x), which has the right asymptotic behavior for large negative x. This approach would have errors >8000 ulps around x=-9 and also be slow for SIMD packets containing elements less than -9, since we would compute both the rational approximant and exp(x).

The new algorithm uses a hybrid range reduction method: First the standard range reduction used in pexp(x) is applied, and secondly we use the identity exp(r) = exp(r/2)^2, to avoid generating denormalized intermediate values for large negative arguments. This enables us to use a fast version of pldexp that does not properly handle denormals, but is significantly faster.

The final result is an implementation that has maximum relative error of 4.5 ulps for normalized results. In addition, the old algorithm would return zero for x < ~-88, while the new algorithm extends the range to x~=-104 where S(x) ~= std::numeric_limits<float>::denorm_min(). Relative accuracy degrades gradually down to 0.033 for arguments around x=-104 where S(x) ~= std::numeric_limits<float>::denorm_min(). This is expected and acceptable.

The new algorithm is about 30% faster than computing e = pexp(x); s = e / (1 + e), while the old inaccurate algorithm was about 40% faster when no arguments x < -9 are present and about 2x slower when they are.

This change also extends the range of the non-specialized version to properly compute dernormalized values of S(x) for large negative x, e.g. the range ~[-104;-88] for float, below which S(x) is too small to be represented even in a denormalized float. This is accomplished by evaluating S(x) as exp(x) / (1 + exp(x)) instead of 1 / (1 + exp(-x)).

Thanks to my Google colleagues James Lottes and Sameer Agarwal for the discussions and experimentation that led to this.

Edited by Rasmus Munk Larsen

Merge request reports

Loading