Vectorize Visitor.h.
This change adds a vectorized codepath in Visitor.h, which speeds up coeffMax(&row, &col) etc. by about 5x on machines with AVX2.
Benchmark of coeffMax(&row, &col) on a random square matrix of float:
name old cpu/op new cpu/op delta
BM_EigenCoeffMax/16 317ns ± 0% 73ns ± 0% -77.16% (p=0.000 n=44+47)
BM_EigenCoeffMax/64 5.30µs ± 0% 0.92µs ± 5% -82.56% (p=0.000 n=42+60)
BM_EigenCoeffMax/128 21.3µs ± 0% 3.6µs ± 1% -83.21% (p=0.000 n=45+48)
BM_EigenCoeffMax/512 341µs ± 0% 56µs ± 0% -83.65% (p=0.000 n=38+60)
BM_EigenCoeffMax/1k 1.42ms ± 0% 0.24ms ± 1% -83.31% (p=0.000 n=36+33)
This also speeds up various matrix decompositions that perform pivot search using coeffMax;
name old cpu/op new cpu/op delta
BM_EigenPartialPivLU/16 1.99µs ± 1% 1.96µs ± 1% -1.32% (p=0.000 n=60+59)
BM_EigenPartialPivLU/64 23.2µs ± 1% 21.7µs ± 2% -6.63% (p=0.000 n=56+58)
BM_EigenPartialPivLU/128 116µs ± 2% 108µs ± 2% -6.56% (p=0.000 n=60+60)
BM_EigenPartialPivLU/512 3.53ms ± 1% 3.40ms ± 2% -3.83% (p=0.000 n=38+38)
BM_EigenPartialPivLU/1k 17.0ms ± 1% 16.4ms ± 1% -3.98% (p=0.000 n=29+27)
BM_EigenFullPivLU/16 3.17µs ± 1% 2.76µs ± 1% -12.99% (p=0.000 n=49+50)
BM_EigenFullPivLU/64 79.2µs ± 2% 53.3µs ± 3% -32.75% (p=0.000 n=58+56)
BM_EigenFullPivLU/128 560µs ± 2% 361µs ± 3% -35.61% (p=0.000 n=60+50)
BM_EigenFullPivLU/512 26.7ms ± 3% 16.5ms ± 2% -38.26% (p=0.000 n=47+47)
BM_EigenFullPivLU/1k 234ms ± 3% 165ms ± 4% -29.52% (p=0.000 n=15+21)
BM_EigencolPivQR/16 4.61µs ± 3% 4.61µs ± 4% ~ (p=0.881 n=58+59)
BM_EigencolPivQR/64 51.7µs ± 2% 51.0µs ± 2% -1.44% (p=0.000 n=58+57)
BM_EigencolPivQR/128 277µs ± 3% 272µs ± 3% -1.97% (p=0.000 n=55+54)
BM_EigencolPivQR/512 9.05ms ± 3% 9.00ms ± 2% ~ (p=0.197 n=45+44)
BM_EigencolPivQR/1k 127ms ± 4% 127ms ± 5% ~ (p=0.421 n=27+26)
BM_EigenfullPivQR/16 5.45µs ± 3% 5.02µs ± 4% -7.78% (p=0.000 n=59+60)
BM_EigenfullPivQR/64 108µs ± 3% 76µs ± 4% -29.07% (p=0.000 n=59+59)
BM_EigenfullPivQR/128 682µs ± 3% 452µs ± 2% -33.78% (p=0.000 n=59+57)
BM_EigenfullPivQR/512 33.1ms ± 4% 20.0ms ± 3% -39.57% (p=0.000 n=44+40)
BM_EigenfullPivQR/1k 323ms ± 1% 225ms ± 3% -30.20% (p=0.000 n=8+15)
Closes #2345 (closed)
Edited by Rasmus Munk Larsen