Skip to content

Make sure we return +/-1 above the clamping point for Erf().

This also gives a tiny speedup in some cases, here measured for AVX2 on Skylake compiled with -march=skylake.

name                      old cpu/op   new cpu/op   delta
BM_eigen_erf_float/1      1.10ns ± 0%  1.09ns ± 0%   -0.43%  (p=0.000 n=55+57)
BM_eigen_erf_float/8      13.9ns ± 1%  12.5ns ± 6%  -10.05%  (p=0.000 n=48+60)
BM_eigen_erf_float/64     38.9ns ± 6%  36.4ns ± 3%   -6.31%  (p=0.000 n=46+42)
BM_eigen_erf_float/512     231ns ± 3%   221ns ± 4%   -4.17%  (p=0.000 n=52+47)
BM_eigen_erf_float/4k     1.80µs ± 3%  1.73µs ± 5%   -3.55%  (p=0.000 n=58+53)
BM_eigen_erf_float/32k    14.2µs ± 3%  13.8µs ± 7%   -3.33%  (p=0.000 n=51+54)
BM_eigen_erf_float/256k    117µs ± 5%   115µs ± 5%   -1.76%  (p=0.000 n=59+57)
BM_eigen_erf_float/1M      470µs ± 3%   463µs ± 6%   -1.47%  (p=0.000 n=58+60)

Merge request reports

Loading