Skip to content
  • Junchao Zhang's avatar
    Vec: add GEMV optimizations for VecMDot and friends for VecStandard · b29a8671
    Junchao Zhang authored
    Remove KSPPIPEFGMRES from example with skip convergence test since very sensitive to happy ending
    
    Appears to have a sweet spot of much better performance for smallish vectors then
    matches unrolled code for large vectors
    
    Sample results on Barry's Apple M2 Laptop (using Apple's BLAS)
    
    ./ex19 -da_refine 5 -pc_type none -log_view -ksp_gmres_preallocate -ksp_view
    
    Vector length 37,636
    
    VecMDot             1920 1.0 1.9707e-01 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 29  0  0  0  25 29  0  0  0 11291
    
    -vec_mdot_use_gemv
    
    VecMDot             1920 1.0 7.5098e-02 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 12 29  0  0  0  12 29  0  0  0 29693
    VecMDot             1920 1.0 8.1523e-02 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 12 29  0  0  0  12 29  0  0  0 27353
    VecMDot             1920 1.0 7.0889e-02 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 29  0  0  0  11 29  0  0  0 31456
    
    -da_refine 6
    
    Vector length 148,996
    
    VecMDot             4340 1.0 1.7666e+00 1.0 2.00e+10 1.0 0.0e+00 0.0e+00 0.0e+00 20 29  0  0  0  20 29  0  0  0 11319
    
    -vec_mdot_use_gemv
    
    VecMDot             4422 1.0 1.3725e+00 1.0 2.04e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 29  0  0  0  15 29  0  0  0 14884
    VecMDot             4422 1.0 1.4354e+00 1.0 2.04e+10 1.0 0.0e+00 0.0e+00 0.0e+00 16 29  0  0  0  16 29  0  0  0 14231
    
    ./ex19 -da_refine 7 -pc_type none -log_view -ksp_gmres_preallocate -ksp_view -vec_mdot_use_gemv -ksp_max_it 100 -snes_max_it 1
    
    Vector length 592,900
    
    VecMDot              100 1.0 1.5915e-01 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27  0  0  0  14 27  0  0  0 10804
    
    -vec_mdot_use_gemv
    
    VecMDot              100 1.0 1.6854e-01 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27  0  0  0  14 27  0  0  0 10230
    VecMDot              100 1.0 1.5698e-01 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27  0  0  0  14 27  0  0  0 10983
    
    -da_refine 8
    
    vector length 2,365,444
    
    VecMDot              100 1.0 6.2499e-01 1.0 6.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13 27  0  0  0  13 27  0  0  0 10976
    
    -vec_mdot_use_gemv
    
    VecMDot              100 1.0 6.8197e-01 1.0 6.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27  0  0  0  14 27  0  0  0 10087
    b29a8671