Skip to content

Draft: Update STRUMPACK interface

Pieter Ghysels requested to merge pghysels/petsc:pghysels/strumpack-update into main

This PR updates the interface to STRUMPACK.


Running "make alltests TIMEOUT=600" gives lots of "not ok"s and other errors, coming from PetscSF, parmetis and superlu-dist.

STRUMPACK provides a sparse direct solver (but unlike fi SuperLU, it is based on the multifrontal method).
STRUMPACK also provides preconditioners based on approximate multifrontal LU factorization. In the precoditioners larger dense blocks in the sparse LU factors are compressed using rank-structured matrix approximations (or using ZFP compression).

See
https://github.com/pghysels/STRUMPACK
https://portal.nersc.gov/project/sparse/strumpack/

This PR adds a GPU option to the STRUMPACK interface, and in the build system enables CUDA when building STRUMPACK using --download-strumpack. The sparse direct solver in STRUMPACK has good GPU (CUDA and HIP) performance. STRUMPACK expects all input/output on the CPU, but can internally off-load work for the numerical factorization to the GPU. The preconditioners do not yet support GPU off-loading.

This PR also adds different preconditioning options. Previously only the HSS (hierarchically Semi-Separable) format was supported. Now we also have BLR (Block Low Rank), HODLR (Hierarchically Off-Diagonal) with the option to replace low-rank with butterfly, Lossy and Lossless compression (through ZFP). The new methods work considerably better than the older HSS code.

When using
-pc_type lu -pc_factor_mat_solver_type strumpack
the solver behaves as a direct solver. When using
-pc_type ilu -pc_factor_mat_solver_type strumpack
it will work as a preconditioner, using BLR compression. This should be a robust preconditioner for a wide range of problems. One can tune the compression tolerance, and the minimum block (separator) size for compression using:
-mat_strumpack_compression_rel_tol 1e-3 -mat_strumpack_compression_min_sep_size 500
You can also select the compression type explicitly
-pc_type ilu -pc_factor_mat_solver_type strumpack -mat_strumpack_compression LOSSY
For now we recommend BLR, as it seems to work best, and does not require additional external dependencies (it is also what MUMPS implements).

I updated the test src/ksp/ksp/tutorials/ex52.c.
TODO: why can I not overwrite options set in ex52, from the command line?

If you can point me to more challenging examples/miniapps, then I will run more performance tests.

This PR now implements the solve with multiple RHS. I used src/ksp/ksp/test/ex26.c to test the solve with multiple RHS using:

OMP_NUM_THREADS=1 mpirun -n 4 ./ex26 \
  -nx 100 -ny 100 -Nx 2 -Ny 2 -nrhs 10 \
  -ksp_type preonly -ksp_monitor \
  -pc_type lu -pc_factor_mat_solver_type strumpack \
  -mat_strumpack_verbose 0 -mat_strumpack_compression_rel_tol 1e-3 \
  -mat_strumpack_compression_leaf_size 128 \
  -mat_strumpack_compression_min_sep_size 500 \
  -mat_strumpack_gpu 0

This PR also adds an option for the GEOMETRIC fill-reducing ordering which performs nested dissection on a regular nx x ny x nz grid with nc degrees of freedom per grid-point and a stencil (potentially a wider stencil). It is assumed that the matrix is in the natural (lexicographical?) ordering.

In PETSc's build system, we now detect if PARMETIS was found and pass that along to STRUMPACK.

STRUMPACK has other dependencies which are not handled through the PETSc build system: SLATE, ButterflyPACK, ZFP, and CombBLAS.

  • SLATE is used as a GPU enabled alternative to ScaLAPACK, so it is required to get good GPU performance in the sparse distributed memory direct solver.
  • ZFP is used for LOSSY and LOSSLESS compression.
  • ButterflyPACK is used for HODLR (Hierarchically Off-Diagonal Low Rank) and HODBF (Hierarchically Off-Diagonal Butterfly) compression.
  • CombBLAS can be used for parallel static pivoting, i.e., finding a matching to permute nonzero elements to the diagonal. Currently the only option is MC64, which is sequential so requires gathering the graph to a single MPI process. This can be disable with -mat_strumpack_colperm 0. For now, these dependencies cannot be enabled through PETSc's build system when building with --download-strumpack, and when using an external install of STRUMPACK, the correct external libraries are not passed along. But SPACK handles these dependencies for STRUMPACK.

STRUMPACK can now be build as shared or static library.

TODO:

  • HIP support for STRUMPACK in PETSc's build system. I could not connect to the AMD GPU test system, but will try again.
  • Check if the documentation renders correctly? Add references.

Joint work with Xiaoye Sherry Li, Lisa Claus and Yang Liu, all from LBNL. Sponsored through ECP and SciDAC FASTMath.

Merge request reports