Vectorize Visitor.h.

This change adds a vectorized codepath in Visitor.h, which speeds up coeffMax(&row, &col) etc. by about 5x on machines with AVX2.

Benchmark of coeffMax(&row, &col) on a random square matrix of float:

name                  old cpu/op  new cpu/op  delta
BM_EigenCoeffMax/16    317ns ± 0%    73ns ± 0%  -77.16%  (p=0.000 n=44+47)
BM_EigenCoeffMax/64   5.30µs ± 0%  0.92µs ± 5%  -82.56%  (p=0.000 n=42+60)
BM_EigenCoeffMax/128  21.3µs ± 0%   3.6µs ± 1%  -83.21%  (p=0.000 n=45+48)
BM_EigenCoeffMax/512   341µs ± 0%    56µs ± 0%  -83.65%  (p=0.000 n=38+60)
BM_EigenCoeffMax/1k   1.42ms ± 0%  0.24ms ± 1%  -83.31%  (p=0.000 n=36+33)

This also speeds up various matrix decompositions that perform pivot search using coeffMax;

name                      old cpu/op  new cpu/op  delta
BM_EigenPartialPivLU/16   1.99µs ± 1%  1.96µs ± 1%   -1.32%  (p=0.000 n=60+59)
BM_EigenPartialPivLU/64   23.2µs ± 1%  21.7µs ± 2%   -6.63%  (p=0.000 n=56+58)
BM_EigenPartialPivLU/128   116µs ± 2%   108µs ± 2%   -6.56%  (p=0.000 n=60+60)
BM_EigenPartialPivLU/512  3.53ms ± 1%  3.40ms ± 2%   -3.83%  (p=0.000 n=38+38)
BM_EigenPartialPivLU/1k   17.0ms ± 1%  16.4ms ± 1%   -3.98%  (p=0.000 n=29+27)
BM_EigenFullPivLU/16      3.17µs ± 1%  2.76µs ± 1%  -12.99%  (p=0.000 n=49+50)
BM_EigenFullPivLU/64      79.2µs ± 2%  53.3µs ± 3%  -32.75%  (p=0.000 n=58+56)
BM_EigenFullPivLU/128      560µs ± 2%   361µs ± 3%  -35.61%  (p=0.000 n=60+50)
BM_EigenFullPivLU/512     26.7ms ± 3%  16.5ms ± 2%  -38.26%  (p=0.000 n=47+47)
BM_EigenFullPivLU/1k       234ms ± 3%   165ms ± 4%  -29.52%  (p=0.000 n=15+21)
BM_EigencolPivQR/16       4.61µs ± 3%  4.61µs ± 4%    ~     (p=0.881 n=58+59)
BM_EigencolPivQR/64       51.7µs ± 2%  51.0µs ± 2%  -1.44%  (p=0.000 n=58+57)
BM_EigencolPivQR/128       277µs ± 3%   272µs ± 3%  -1.97%  (p=0.000 n=55+54)
BM_EigencolPivQR/512      9.05ms ± 3%  9.00ms ± 2%    ~     (p=0.197 n=45+44)
BM_EigencolPivQR/1k        127ms ± 4%   127ms ± 5%    ~     (p=0.421 n=27+26)
BM_EigenfullPivQR/16      5.45µs ± 3%  5.02µs ± 4%   -7.78%  (p=0.000 n=59+60)
BM_EigenfullPivQR/64       108µs ± 3%    76µs ± 4%  -29.07%  (p=0.000 n=59+59)
BM_EigenfullPivQR/128      682µs ± 3%   452µs ± 2%  -33.78%  (p=0.000 n=59+57)
BM_EigenfullPivQR/512     33.1ms ± 4%  20.0ms ± 3%  -39.57%  (p=0.000 n=44+40)
BM_EigenfullPivQR/1k       323ms ± 1%   225ms ± 3%  -30.20%   (p=0.000 n=8+15)

Closes #2345 (closed)

Edited by Rasmus Munk Larsen

Merge request reports

Loading