Vectorize erfc() for float
This adds a vectorized implementation of the complementary error function. The implementation is accurate to 34 ulps across the entire range and 5 ulps for |x| < 1.
Benchmark measurements compared to the old implementation using std::erfc()
:
SSE 4.2, |x| < 1:
name old cpu/op new cpu/op delta
BM_eigen_erfc_float/8 39.5ns ± 0% 21.5ns ± 0% -45.56% (p=0.000 n=41+52)
BM_eigen_erfc_float/64 336ns ± 1% 174ns ± 0% -48.26% (p=0.000 n=42+51)
BM_eigen_erfc_float/512 2.67µs ± 1% 1.40µs ± 0% -47.78% (p=0.000 n=49+50)
BM_eigen_erfc_float/4k 21.4µs ± 1% 11.2µs ± 0% -47.75% (p=0.000 n=52+53)
BM_eigen_erfc_float/32k 171µs ± 1% 89µs ± 1% -47.82% (p=0.000 n=54+55)
BM_eigen_erfc_float/256k 1.37ms ± 1% 0.71ms ± 0% -47.81% (p=0.000 n=49+50)
BM_eigen_erfc_float/1M 5.47ms ± 1% 2.86ms ± 0% -47.78% (p=0.000 n=52+45)
SSE 4.2, |x| > 1:
name old cpu/op new cpu/op delta
BM_eigen_erfc_float/64 1.42µs ± 1% 0.74µs ± 1% -48.27% (p=0.000 n=53+47)
BM_eigen_erfc_float/512 11.5µs ± 1% 5.9µs ± 4% -48.28% (p=0.000 n=55+51)
BM_eigen_erfc_float/4k 92.1µs ± 3% 48.9µs ± 9% -46.89% (p=0.000 n=51+60)
BM_eigen_erfc_float/32k 739µs ± 2% 389µs ± 9% -47.30% (p=0.000 n=51+48)
BM_eigen_erfc_float/256k 5.92ms ± 1% 3.06ms ± 1% -48.27% (p=0.000 n=52+42)
BM_eigen_erfc_float/1M 23.7ms ± 1% 12.3ms ± 4% -47.92% (p=0.000 n=41+52)
AVX2+FMA, |x| < 1:
name old cpu/op new cpu/op delta
BM_eigen_erfc_float/8 39.7ns ± 3% 13.3ns ± 0% -66.42% (p=0.000 n=47+51)
BM_eigen_erfc_float/64 337ns ± 2% 92ns ± 0% -72.68% (p=0.000 n=41+54)
BM_eigen_erfc_float/512 2.68µs ± 1% 0.73µs ± 0% -72.88% (p=0.000 n=49+57)
BM_eigen_erfc_float/4k 21.4µs ± 1% 5.8µs ± 0% -72.77% (p=0.000 n=52+54)
BM_eigen_erfc_float/32k 171µs ± 1% 46µs ± 1% -72.90% (p=0.000 n=49+56)
BM_eigen_erfc_float/256k 1.37ms ± 1% 0.37ms ± 1% -72.91% (p=0.000 n=53+45)
BM_eigen_erfc_float/1M 5.48ms ± 1% 1.49ms ± 1% -72.84% (p=0.000 n=54+56)
AVX2+FMA, |x| > 1:
name old cpu/op new cpu/op delta
BM_eigen_erfc_float/8 164ns ± 3% 65ns ± 4% -60.55% (p=0.000 n=48+57)
BM_eigen_erfc_float/64 1.42µs ± 1% 0.55µs ± 3% -61.29% (p=0.000 n=47+58)
BM_eigen_erfc_float/512 11.5µs ± 2% 4.4µs ± 5% -61.46% (p=0.000 n=47+60)
BM_eigen_erfc_float/4k 91.9µs ± 1% 35.4µs ± 3% -61.49% (p=0.000 n=46+49)
BM_eigen_erfc_float/32k 738µs ± 2% 284µs ± 3% -61.48% (p=0.000 n=47+54)
BM_eigen_erfc_float/256k 5.92ms ± 4% 2.28ms ± 3% -61.54% (p=0.000 n=48+60)
BM_eigen_erfc_float/1M 23.8ms ± 6% 9.1ms ± 3% -61.72% (p=0.000 n=46+60)
Edited by Rasmus Munk Larsen