Skip to content

Vectorize the sign operator in Eigen.

This fixes an old TODO to vectorize scalar_sign_op for real types.

Measured speedup on Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz:

(ctype == std::complex<float>, cdtype == std::complex<double>)

--march=nehalem (SSE*)
before:
BM_eigen_sign_double/8_mean              6.44           6.41   109201388  
BM_eigen_sign_double/64_mean            34.0           34.0     20234383  
BM_eigen_sign_double/512_mean          260            260        2666898  
BM_eigen_sign_double/2k_mean          1016           1016         687992  
BM_eigen_sign_float/8_mean               9.71           9.71    72262781  
BM_eigen_sign_float/64_mean             21.7           21.7     32433773  
BM_eigen_sign_float/512_mean           117            117        5997151  
BM_eigen_sign_float/2k_mean            449            448        1200000  
BM_eigen_sign_ctype/8_mean              57.2           57.2     12000000  
BM_eigen_sign_ctype/64_mean            509            509        1200000  
BM_eigen_sign_ctype/512_mean          4095           4086         167682  
BM_eigen_sign_ctype/2k_mean          16289          16288          42865  
BM_eigen_sign_cdtype/8_mean             75.5           75.2      9199233  
BM_eigen_sign_cdtype/64_mean           722            722         982228  
BM_eigen_sign_cdtype/512_mean         6704           6682         105444  
BM_eigen_sign_cdtype/2k_mean         27827          27832          24978  

after:
BM_eigen_sign_double/8_mean              5.74           5.74   120000000  
BM_eigen_sign_double/64_mean            31.7           31.7     21927020  
BM_eigen_sign_double/512_mean          252            252        2706623  
BM_eigen_sign_double/2k_mean           982            982         712900  
BM_eigen_sign_float/8_mean               9.68           9.68    72326995  *not vectorized
BM_eigen_sign_float/64_mean             21.4           21.4     32621349  *not vectorized
BM_eigen_sign_float/512_mean           116            116        6013897  *not vectorized
BM_eigen_sign_float/2k_mean            447            447        1200000  *not vectorized
BM_eigen_sign_ctype/8_mean              15.6           15.6     45293266  
BM_eigen_sign_ctype/64_mean             88.6           88.6      7900154  
BM_eigen_sign_ctype/512_mean           680            680        1031326  
BM_eigen_sign_ctype/2k_mean           2689           2691         257909  
BM_eigen_sign_cdtype/8_mean             28.4           28.4     24440221  
BM_eigen_sign_cdtype/64_mean           257            257        2711124  
BM_eigen_sign_cdtype/512_mean         2077           2077         336557  
BM_eigen_sign_cdtype/2k_mean          8312           8313          84323  

 
--march=skylake   (AVX2)
before:
BM_eigen_sign_double/8_mean             12.0           12.0     58310104  
BM_eigen_sign_double/64_mean            38.5           38.5     17869356  
BM_eigen_sign_double/512_mean          250            249        2775578  
BM_eigen_sign_double/2k_mean           996            996         714018  
BM_eigen_sign_float/8_mean              12.7           12.7     55495035  
BM_eigen_sign_float/64_mean             32.4           32.4     21378522  
BM_eigen_sign_float/512_mean           122            122        5698877  
BM_eigen_sign_float/2k_mean            414            413        1618011  
BM_eigen_sign_ctype/8_mean              58.2           58.2     11965594  
BM_eigen_sign_ctype/64_mean            518            518        1200000  
BM_eigen_sign_ctype/512_mean          4080           4063         169364  
BM_eigen_sign_ctype/2k_mean          16333          16332          42414  
BM_eigen_sign_cdtype/8_mean             79.7           79.7      8728939  
BM_eigen_sign_cdtype/64_mean           718            717         971102  
BM_eigen_sign_cdtype/512_mean         6705           6700         105340  
BM_eigen_sign_cdtype/2k_mean         28451          28447          24539  
 
after:
BM_eigen_sign_double/8_mean             10.2           10.2     68883377  
BM_eigen_sign_double/64_mean            27.8           27.8     25028814  
BM_eigen_sign_double/512_mean          169            169        4134488  
BM_eigen_sign_double/2k_mean           631            631        1102968  
BM_eigen_sign_float/8_mean              16.4           16.4     42694836  
BM_eigen_sign_float/64_mean             21.1           21.1     33109093  
BM_eigen_sign_float/512_mean            96.9           96.9      7209604  
BM_eigen_sign_float/2k_mean            326            326        2110066  
BM_eigen_sign_ctype/8_mean              27.7           27.7     25070458   
BM_eigen_sign_ctype/64_mean             96.1           96.1      7270548  
BM_eigen_sign_ctype/512_mean           634            634        1102494  
BM_eigen_sign_ctype/2k_mean           2467           2467         280365  
BM_eigen_sign_cdtype/8_mean             28.3           28.3     24573556  
BM_eigen_sign_cdtype/64_mean           241            241        2869555  
BM_eigen_sign_cdtype/512_mean         1946           1946         358788  
BM_eigen_sign_cdtype/2k_mean          7793           7793          89187


--march=skylake-avx512  (AVX512)
before:
BM_eigen_sign_double/8_mean             11.5           11.5     61014691  
BM_eigen_sign_double/64_mean            41.5           41.5     16519411  
BM_eigen_sign_double/512_mean          285            285        2438747  
BM_eigen_sign_double/2k_mean          1140           1140         598276  
BM_eigen_sign_float/8_mean              11.5           11.5     61125484  
BM_eigen_sign_float/64_mean             29.4           29.4     23598559  
BM_eigen_sign_float/512_mean           103            103        6759891  
BM_eigen_sign_float/2k_mean            371            371        1851622  
BM_eigen_sign_ctype/8_mean              58.4           58.4     11787688  
BM_eigen_sign_ctype/64_mean            509            509        1200000  
BM_eigen_sign_ctype/512_mean          4091           4093         168895  
BM_eigen_sign_ctype/2k_mean          16314          16295          42802  
BM_eigen_sign_cdtype/8_mean             79.6           79.6      8605987  
BM_eigen_sign_cdtype/64_mean           717            717         965781  
BM_eigen_sign_cdtype/512_mean         6831           6828         101197  
BM_eigen_sign_cdtype/2k_mean         28749          28743          24183  
 
after:
BM_eigen_sign_double/8_mean             16.4           16.4     42809039  
BM_eigen_sign_double/64_mean            23.2           23.2     30221446  
BM_eigen_sign_double/512_mean           74.3           74.3      9397251  
BM_eigen_sign_double/2k_mean           258            259        2604931  
BM_eigen_sign_float/8_mean              16.5           16.5     42515449  
BM_eigen_sign_float/64_mean             31.3           31.3     22136770  
BM_eigen_sign_float/512_mean            60.9           60.9     11516560  
BM_eigen_sign_float/2k_mean            153            153        4570941  
BM_eigen_sign_ctype/8_mean              62.7           62.7     10956854  
BM_eigen_sign_ctype/64_mean            121            121        5783435  
BM_eigen_sign_ctype/512_mean           501            501        1200000  
BM_eigen_sign_ctype/2k_mean           1835           1835         379651  
BM_eigen_sign_cdtype/8_mean             57.7           57.8     11825262  
BM_eigen_sign_cdtype/64_mean           203            203        3418327  
BM_eigen_sign_cdtype/512_mean         1392           1392         497932  
BM_eigen_sign_cdtype/2k_mean          5406           5406         120000  
Edited by Rasmus Munk Larsen

Merge request reports

Loading