Vectorize the sign operator in Eigen.
This fixes an old TODO to vectorize scalar_sign_op for real types.
Measured speedup on Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz:
(ctype == std::complex<float>, cdtype == std::complex<double>)
--march=nehalem (SSE*)
before:
BM_eigen_sign_double/8_mean 6.44 6.41 109201388
BM_eigen_sign_double/64_mean 34.0 34.0 20234383
BM_eigen_sign_double/512_mean 260 260 2666898
BM_eigen_sign_double/2k_mean 1016 1016 687992
BM_eigen_sign_float/8_mean 9.71 9.71 72262781
BM_eigen_sign_float/64_mean 21.7 21.7 32433773
BM_eigen_sign_float/512_mean 117 117 5997151
BM_eigen_sign_float/2k_mean 449 448 1200000
BM_eigen_sign_ctype/8_mean 57.2 57.2 12000000
BM_eigen_sign_ctype/64_mean 509 509 1200000
BM_eigen_sign_ctype/512_mean 4095 4086 167682
BM_eigen_sign_ctype/2k_mean 16289 16288 42865
BM_eigen_sign_cdtype/8_mean 75.5 75.2 9199233
BM_eigen_sign_cdtype/64_mean 722 722 982228
BM_eigen_sign_cdtype/512_mean 6704 6682 105444
BM_eigen_sign_cdtype/2k_mean 27827 27832 24978
after:
BM_eigen_sign_double/8_mean 5.74 5.74 120000000
BM_eigen_sign_double/64_mean 31.7 31.7 21927020
BM_eigen_sign_double/512_mean 252 252 2706623
BM_eigen_sign_double/2k_mean 982 982 712900
BM_eigen_sign_float/8_mean 9.68 9.68 72326995 *not vectorized
BM_eigen_sign_float/64_mean 21.4 21.4 32621349 *not vectorized
BM_eigen_sign_float/512_mean 116 116 6013897 *not vectorized
BM_eigen_sign_float/2k_mean 447 447 1200000 *not vectorized
BM_eigen_sign_ctype/8_mean 15.6 15.6 45293266
BM_eigen_sign_ctype/64_mean 88.6 88.6 7900154
BM_eigen_sign_ctype/512_mean 680 680 1031326
BM_eigen_sign_ctype/2k_mean 2689 2691 257909
BM_eigen_sign_cdtype/8_mean 28.4 28.4 24440221
BM_eigen_sign_cdtype/64_mean 257 257 2711124
BM_eigen_sign_cdtype/512_mean 2077 2077 336557
BM_eigen_sign_cdtype/2k_mean 8312 8313 84323
--march=skylake (AVX2)
before:
BM_eigen_sign_double/8_mean 12.0 12.0 58310104
BM_eigen_sign_double/64_mean 38.5 38.5 17869356
BM_eigen_sign_double/512_mean 250 249 2775578
BM_eigen_sign_double/2k_mean 996 996 714018
BM_eigen_sign_float/8_mean 12.7 12.7 55495035
BM_eigen_sign_float/64_mean 32.4 32.4 21378522
BM_eigen_sign_float/512_mean 122 122 5698877
BM_eigen_sign_float/2k_mean 414 413 1618011
BM_eigen_sign_ctype/8_mean 58.2 58.2 11965594
BM_eigen_sign_ctype/64_mean 518 518 1200000
BM_eigen_sign_ctype/512_mean 4080 4063 169364
BM_eigen_sign_ctype/2k_mean 16333 16332 42414
BM_eigen_sign_cdtype/8_mean 79.7 79.7 8728939
BM_eigen_sign_cdtype/64_mean 718 717 971102
BM_eigen_sign_cdtype/512_mean 6705 6700 105340
BM_eigen_sign_cdtype/2k_mean 28451 28447 24539
after:
BM_eigen_sign_double/8_mean 10.2 10.2 68883377
BM_eigen_sign_double/64_mean 27.8 27.8 25028814
BM_eigen_sign_double/512_mean 169 169 4134488
BM_eigen_sign_double/2k_mean 631 631 1102968
BM_eigen_sign_float/8_mean 16.4 16.4 42694836
BM_eigen_sign_float/64_mean 21.1 21.1 33109093
BM_eigen_sign_float/512_mean 96.9 96.9 7209604
BM_eigen_sign_float/2k_mean 326 326 2110066
BM_eigen_sign_ctype/8_mean 27.7 27.7 25070458
BM_eigen_sign_ctype/64_mean 96.1 96.1 7270548
BM_eigen_sign_ctype/512_mean 634 634 1102494
BM_eigen_sign_ctype/2k_mean 2467 2467 280365
BM_eigen_sign_cdtype/8_mean 28.3 28.3 24573556
BM_eigen_sign_cdtype/64_mean 241 241 2869555
BM_eigen_sign_cdtype/512_mean 1946 1946 358788
BM_eigen_sign_cdtype/2k_mean 7793 7793 89187
--march=skylake-avx512 (AVX512)
before:
BM_eigen_sign_double/8_mean 11.5 11.5 61014691
BM_eigen_sign_double/64_mean 41.5 41.5 16519411
BM_eigen_sign_double/512_mean 285 285 2438747
BM_eigen_sign_double/2k_mean 1140 1140 598276
BM_eigen_sign_float/8_mean 11.5 11.5 61125484
BM_eigen_sign_float/64_mean 29.4 29.4 23598559
BM_eigen_sign_float/512_mean 103 103 6759891
BM_eigen_sign_float/2k_mean 371 371 1851622
BM_eigen_sign_ctype/8_mean 58.4 58.4 11787688
BM_eigen_sign_ctype/64_mean 509 509 1200000
BM_eigen_sign_ctype/512_mean 4091 4093 168895
BM_eigen_sign_ctype/2k_mean 16314 16295 42802
BM_eigen_sign_cdtype/8_mean 79.6 79.6 8605987
BM_eigen_sign_cdtype/64_mean 717 717 965781
BM_eigen_sign_cdtype/512_mean 6831 6828 101197
BM_eigen_sign_cdtype/2k_mean 28749 28743 24183
after:
BM_eigen_sign_double/8_mean 16.4 16.4 42809039
BM_eigen_sign_double/64_mean 23.2 23.2 30221446
BM_eigen_sign_double/512_mean 74.3 74.3 9397251
BM_eigen_sign_double/2k_mean 258 259 2604931
BM_eigen_sign_float/8_mean 16.5 16.5 42515449
BM_eigen_sign_float/64_mean 31.3 31.3 22136770
BM_eigen_sign_float/512_mean 60.9 60.9 11516560
BM_eigen_sign_float/2k_mean 153 153 4570941
BM_eigen_sign_ctype/8_mean 62.7 62.7 10956854
BM_eigen_sign_ctype/64_mean 121 121 5783435
BM_eigen_sign_ctype/512_mean 501 501 1200000
BM_eigen_sign_ctype/2k_mean 1835 1835 379651
BM_eigen_sign_cdtype/8_mean 57.7 57.8 11825262
BM_eigen_sign_cdtype/64_mean 203 203 3418327
BM_eigen_sign_cdtype/512_mean 1392 1392 497932
BM_eigen_sign_cdtype/2k_mean 5406 5406 120000
Edited by Rasmus Munk Larsen