Vectorize allFinite()

For example, this speeds up allFinite by about 2.7x with AVX.

before:

CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                 Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------
BM_allFinite/8_mean             38.9           38.9     17706244  1.645G items/s 
BM_allFinite/64_mean          2318           2318         299977  1.767G items/s 
BM_allFinite/512_mean       151051         151039           4619  1.737G items/s 
BM_allFinite/1k_mean        611419         612018           1099  1.714G items/s 


after:

CPU: Intel Skylake Xeon with HyperThreading (36 cores) dL1:32KB dL2:1024KB dL3:24MB
Benchmark                 Time(ns)        CPU(ns)     Iterations
----------------------------------------------------------------
BM_allFinite/8_mean             13.6           13.6     51560404  4.708G items/s 
BM_allFinite/64_mean           843            843         830627  4.861G items/s 
BM_allFinite/512_mean        54340          54352          12000  4.823G items/s 
BM_allFinite/1k_mean        221361         221362           3168  4.737G items/s 
Edited by Rasmus Munk Larsen

Merge request reports

Loading