Avoid producing erf(x) = NaN for large |x|.
Fixes a bug in !1706 (merged)
Slowdown by ~5%. Speedup is still up to ~28% (for AVX+FMA) compared to the original implementation.
Edited by Rasmus Munk Larsen
Fixes a bug in !1706 (merged)
Slowdown by ~5%. Speedup is still up to ~28% (for AVX+FMA) compared to the original implementation.