Error while running with connectX 6 infiniBand
I'm using PETSc 3.21.0 with my application, and use the hypre as the preconditioner. And my mpi is intel mpi within intel oneapi 2024, the infiniBand is connectX 6. It does well in single node. However, when I turned to multi-nodes, it occurs the following problem.
[51]PETSC ERROR: ------------------------------------------------------------------------
[51]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[51]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[51]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[51]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[51]PETSC ERROR: to get more information on the crash.
[51]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
Abort(59) on node 51 (rank 51 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 51
I think my code is correct because it does well in connectX 5 infiniBand in multi-nodes.
We also try to change the version of PETSc/hypre/oneapi, however it did no help. So we think there may be something wrong between infiniBand with connectX 6 and PETSc.
We are wondering whether anyone has encounted this problem.