Skip to content

Simplify CUPM VecMDot implementation

Junchao Zhang requested to merge jczhang/simplify-cupm-vecmdot into main

With rocm-5.7.1 on Noether, src/snes/tutorials/ex19 with hip diverged at the 9 KSP Residual norm. It worked with rocm-5.6.0.

./ex19 -dm_vec_type hip -da_refine 1 -ksp_norm_type unpreconditioned -pc_type none -{snes,ksp}_converged_reason -{snes,ksp}_monitor
lid velocity = 0.0204082, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 1.461936698223e-01
    0 KSP Residual norm 1.461936698223e-01
    1 KSP Residual norm 1.353380984613e-01
    2 KSP Residual norm 9.321697386431e-02
    3 KSP Residual norm 5.717821772967e-02
    4 KSP Residual norm 3.785015863541e-02
    5 KSP Residual norm 2.625702800419e-02
    6 KSP Residual norm 2.096582429271e-02
    7 KSP Residual norm 1.640421327503e-02
    8 KSP Residual norm 1.103433883547e-02
    9 KSP Residual norm 1.098195330680e-02

Note with Kokkos,

$ ./ex19 -dm_vec_type kokkos -da_refine 1 -ksp_norm_type unpreconditioned -pc_type none -{snes,ksp}_converged_reason -{snes,ksp}_monitor
lid velocity = 0.0204082, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 1.461936698223e-01
    0 KSP Residual norm 1.461936698223e-01
    1 KSP Residual norm 1.353380984613e-01
    2 KSP Residual norm 9.321697386431e-02
    3 KSP Residual norm 5.717821772967e-02
    4 KSP Residual norm 3.785015863541e-02
    5 KSP Residual norm 2.625702800419e-02
    6 KSP Residual norm 2.096582429271e-02
    7 KSP Residual norm 1.640421327503e-02
    8 KSP Residual norm 1.103433883547e-02
    9 KSP Residual norm 7.657261655432e-03

I set a breakpoint in PetscErrorCode VecSeq_CUPM<T>::MDot_. With nv=9, the old splits the computation into two parts: the first 8 vectors uses MDot_kernel_dispatch_ and the last vector uses cupmBlasXdot. I found cupmBlasXdot returned a wrong value z[8]=0.81948725666417521. The correct answer should be 3.9924272474983336.

I also checked the hipblasPointerMode and the hip stream just before calling cupmBlasXdot. They were good.

So I really don't know what's wrong with the old code. I just simplified the code (and avoided cupmBlasXdot in VecMDot).

Edited by Junchao Zhang

Merge request reports