More numerical stability for Math.
Math
sums N floats from 0 to N−1 accumulating up to O(N) (O(√N) in average if the wiki is right) round-off errors. Simple pairwise summation method reduces this to O(log N), and, unlike its big brothers, has no speed penalty (not on my computer around given cutoff). For reference, Kahan summation is marginally better and 4× slower for obvious reasons, and Python uses perfect algorithm for math.fsum()
, even more complex and even slower.
Demonstration that sums up 1 with N repeats of 0.05 / N (so that exact result, ignoring the inexactness of the calculated 0.05 / N itself, must be 1.05) using both methods:
My output on x86-64/win64
:
sizeof(float) = 8
--- 1 + 0.05/100000 + ... x100000... + 0.05/100000 ---
sum_linear = 1.0500000000069889E+000
31475 ULP away from 1.05
sum_pairwise = 1.0500000000000003E+000
1 ULP away from 1.05
--- 1 + 0.05/1000000 + ... x1000000... + 0.05/1000000 ---
sum_linear = 1.0499999999181711E+000
368525 ULP away from 1.05
sum_pairwise = 1.0499999999999994E+000
3 ULP away from 1.05
--- 1 + 0.05/10000000 + ... x10000000... + 0.05/10000000 ---
sum_linear = 1.0499999996961265E+000
1368525 ULP away from 1.05
sum_pairwise = 1.0499999999999998E+000
1 ULP away from 1.05
--- 1 + 0.05/100000000 + ... x100000000... + 0.05/100000000 ---
sum_linear = 1.0500000041370186E+000
18631475 ULP away from 1.05
sum_pairwise = 1.0500000000000005E+000
2 ULP away from 1.05
Test that covers all functions changed:
BTW: can generics be used to reduce code repetition, or it has something to do with older compiler versions?