WIP: Throttle pending isends (!2757) · Merge requests · PETSc / petsc

Junchao Zhang requested to merge jczhang/throttle-pending-isends into master Apr 27, 2020

It is not a bug fix. It is an awkward workaround for MPI failures under overwhelming traffic.

In this email thread, petsc user Randall Mackie reported a failure in PetscGatherMessageLengths(), tested with MPICH, Intel MPI 2018/2019 and OpenMPI. From our investigation, it seems the failure was due to overwhelming traffic.

In one test with 5120 ranks, a global vector on a DMDA is copied to global vectors of DMDAs on 320 sub-communicators. In PetscGatherMessageLengths, each rank has about 1200 send+recv neighbors.

To suppress the overwhelming traffic, this MR adds two changes:

Skew the communication to let each rank send to ranks greater than itself first.
Introduce a new option -max_pending_isends, to control number of pending isends (i.e., isends unfinished with MPI_Waitall). Default is 512.

Since I only observed failures in PetscGatherMessageLengths() and PetscCommBuildTwoSided(), I changed them only and did not hunt for communications all over petsc code.

Edited Sep 17, 2021 by Junchao Zhang

WIP: Throttle pending isends

Merge request reports