Simplify CUPM VecMDot implementation
With rocm-5.7.1 on Noether, src/snes/tutorials/ex19 with hip diverged at the 9 KSP Residual norm
. It worked with rocm-5.6.0.
./ex19 -dm_vec_type hip -da_refine 1 -ksp_norm_type unpreconditioned -pc_type none -{snes,ksp}_converged_reason -{snes,ksp}_monitor
lid velocity = 0.0204082, prandtl # = 1., grashof # = 1.
0 SNES Function norm 1.461936698223e-01
0 KSP Residual norm 1.461936698223e-01
1 KSP Residual norm 1.353380984613e-01
2 KSP Residual norm 9.321697386431e-02
3 KSP Residual norm 5.717821772967e-02
4 KSP Residual norm 3.785015863541e-02
5 KSP Residual norm 2.625702800419e-02
6 KSP Residual norm 2.096582429271e-02
7 KSP Residual norm 1.640421327503e-02
8 KSP Residual norm 1.103433883547e-02
9 KSP Residual norm 1.098195330680e-02
Note with Kokkos,
$ ./ex19 -dm_vec_type kokkos -da_refine 1 -ksp_norm_type unpreconditioned -pc_type none -{snes,ksp}_converged_reason -{snes,ksp}_monitor
lid velocity = 0.0204082, prandtl # = 1., grashof # = 1.
0 SNES Function norm 1.461936698223e-01
0 KSP Residual norm 1.461936698223e-01
1 KSP Residual norm 1.353380984613e-01
2 KSP Residual norm 9.321697386431e-02
3 KSP Residual norm 5.717821772967e-02
4 KSP Residual norm 3.785015863541e-02
5 KSP Residual norm 2.625702800419e-02
6 KSP Residual norm 2.096582429271e-02
7 KSP Residual norm 1.640421327503e-02
8 KSP Residual norm 1.103433883547e-02
9 KSP Residual norm 7.657261655432e-03
I set a breakpoint in PetscErrorCode VecSeq_CUPM<T>::MDot_
. With nv=9, the old splits the computation into two parts: the first 8 vectors uses MDot_kernel_dispatch_
and the last vector uses cupmBlasXdot
. I found cupmBlasXdot
returned a wrong value z[8]=0.81948725666417521
. The correct answer should be 3.9924272474983336
.
I also checked the hipblasPointerMode
and the hip stream just before calling cupmBlasXdot
. They were good.
So I really don't know what's wrong with the old code. I just simplified the code (and avoided cupmBlasXdot in VecMDot).
Edited by Junchao Zhang