Potential bug in MatMatMult() for sparse-dense matmul (SpMM)
Dear PETSc developers,
I was testing the sparse-dense matrix multiplication (SpMM, a dense C := a sparse A * a dense B) in PETSc and got some errors.
My SpMV and SpMM test codes are available here: https://github.com/huanghua1994/HPC_Playground/tree/master/PETSc-test
I compiled and used PETSc release version 3.20.5 with Intel ICC v19.1.3 and MVAPICH2 2.3.6. The detailed configuration parameters can be found in the error message below.
I use the cage15 matrix from the SuiteSparse matrix collection as the input sparse matrix. I ran the SpMV and the SpMM test code using 96 MPI processes on 4 nodes (24 cores per node). The SpMV test code worked well, but the SpMM test code reported multiple errors.
For a dense B matrix with 2, 4, or 8 columns, I got the following error message:
[40]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[40]PETSC ERROR: General MPI error
[40]PETSC ERROR: MPI error 1 Invalid buffer pointer
[40]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[40]PETSC ERROR: Petsc Release Version 3.20.5, Feb 27, 2024
[40]PETSC ERROR: /storage/coda1/d-coc/0/hhuang368/data/SpMM-tests/./test_petsc_spmm.exe on a named atl1-1-03-003-29-2.pace.gatech.edu by hhuang368 Mon Apr 15 10:37:02 2024
slurmstepd: error: *** STEP 5683173.0 ON atl1-1-03-003-29-1 CANCELLED AT 2024-04-15T10:38:19 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[40]PETSC ERROR: Configure options --prefix=./install --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-blaslapack-dir=/usr/local/pace-apps/spack/packages/linux-rhel7-x86_64/gcc-4.8.5/intel-parallel-studio-cluster.2020.4-5mxdw276vo2p6wtdkoaghj5h2zkmozjt/compilers_and_libraries_2020.4.304/linux/mkl --with-metis=1 --with-mkl_sparse=1 --with-batch --with-debugging=0 --with-openmp
[40]PETSC ERROR: #1 MatMPIDenseScatter() at /storage/scratch1/8/hhuang368/petsc-3.20.5/src/mat/impls/aij/mpi/mpimatmatmult.c:564
[40]PETSC ERROR: #2 MatMatMultNumeric_MPIAIJ_MPIDense() at /storage/scratch1/8/hhuang368/petsc-3.20.5/src/mat/impls/aij/mpi/mpimatmatmult.c:591
[40]PETSC ERROR: #3 MatProductNumeric_AB() at /storage/scratch1/8/hhuang368/petsc-3.20.5/src/mat/interface/matproduct.c:579
[40]PETSC ERROR: #4 MatProductNumeric() at /storage/scratch1/8/hhuang368/petsc-3.20.5/src/mat/interface/matproduct.c:680
[40]PETSC ERROR: #5 MatProduct_Private() at /storage/scratch1/8/hhuang368/petsc-3.20.5/src/mat/interface/matrix.c:10051
[40]PETSC ERROR: #6 MatMatMult() at /storage/scratch1/8/hhuang368/petsc-3.20.5/src/mat/interface/matrix.c:10100
[40]PETSC ERROR: #7 main() at test_petsc_spmm.c:110
[40]PETSC ERROR: No PETSc Option Table entries
[40]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint@mcs.anl.gov----------
Please let me know if this is a bug or if I do not use PETSc functions correctly.
Thanks, Hua