Vectorize erf(x) for double.

Measured speedup:

SSE 4.2:
name                      old cpu/op   new cpu/op   delta
BM_eigen_erf_double/1     3.00ns ± 1%  3.00ns ± 0%     ~     (p=0.455 n=48+47)
BM_eigen_erf_double/8     34.8ns ± 1%  17.9ns ± 8%  -48.42%  (p=0.000 n=45+59)
BM_eigen_erf_double/64     294ns ± 2%    80ns ±12%  -72.86%  (p=0.000 n=53+59)
BM_eigen_erf_double/512   2.32µs ± 3%  0.57µs ± 6%  -75.45%  (p=0.000 n=55+59)
BM_eigen_erf_double/4k    18.4µs ± 2%   4.5µs ± 6%  -75.59%  (p=0.000 n=50+60)
BM_eigen_erf_double/32k    147µs ± 1%    36µs ± 6%  -75.69%  (p=0.000 n=54+49)
BM_eigen_erf_double/256k  1.18ms ± 2%  0.29ms ± 5%  -75.83%  (p=0.000 n=55+55)
BM_eigen_erf_double/1M    4.76ms ± 3%  1.17ms ±11%  -75.34%  (p=0.000 n=58+58)

AVX2+FMA:
name                      old cpu/op   new cpu/op   delta
BM_eigen_erf_double/1     3.00ns ± 1%  3.54ns ± 1%  +18.14%  (p=0.000 n=47+46)
BM_eigen_erf_double/8     34.8ns ± 1%  20.5ns ± 7%  -41.28%  (p=0.000 n=45+54)
BM_eigen_erf_double/64     295ns ± 3%    65ns ±13%  -78.03%  (p=0.000 n=53+59)
BM_eigen_erf_double/512   2.32µs ± 3%  0.39µs ± 4%  -83.20%  (p=0.000 n=57+47)
BM_eigen_erf_double/4k    18.5µs ± 3%   3.0µs ± 7%  -83.63%  (p=0.000 n=57+53)
BM_eigen_erf_double/32k    148µs ± 3%    24µs ± 3%  -83.54%  (p=0.000 n=58+53)
BM_eigen_erf_double/256k  1.19ms ± 3%  0.21ms ± 4%  -82.05%  (p=0.000 n=57+55)
BM_eigen_erf_double/1M    4.75ms ± 2%  0.87ms ± 8%  -81.69%  (p=0.000 n=56+55)

AVX512:
name                      old cpu/op   new cpu/op   delta
BM_eigen_erf_double/1     3.01ns ± 1%  3.56ns ± 1%  +18.39%  (p=0.000 n=51+47)
BM_eigen_erf_double/8     35.1ns ± 3%  33.3ns ± 1%   -5.34%  (p=0.000 n=46+42)
BM_eigen_erf_double/64     306ns ± 9%    76ns ± 2%  -75.27%  (p=0.000 n=50+60)
BM_eigen_erf_double/512   2.39µs ± 8%  0.35µs ± 3%  -85.17%  (p=0.000 n=55+48)
BM_eigen_erf_double/4k    19.3µs ±12%   2.6µs ± 2%  -86.62%  (p=0.000 n=56+53)
BM_eigen_erf_double/32k    154µs ± 9%    20µs ± 3%  -86.70%  (p=0.000 n=55+60)
BM_eigen_erf_double/256k  1.23ms ± 7%  0.18ms ± 4%  -85.02%  (p=0.000 n=59+57)
BM_eigen_erf_double/1M    4.98ms ±12%  0.74ms ± 3%  -85.12%  (p=0.000 n=58+55)

Merge request reports

Loading