Skip to content

WIP: Throttle pending isends

Junchao Zhang requested to merge jczhang/throttle-pending-isends into master

It is not a bug fix. It is an awkward workaround for MPI failures under overwhelming traffic.

In this email thread, petsc user Randall Mackie reported a failure in PetscGatherMessageLengths(), tested with MPICH, Intel MPI 2018/2019 and OpenMPI. From our investigation, it seems the failure was due to overwhelming traffic.

In one test with 5120 ranks, a global vector on a DMDA is copied to global vectors of DMDAs on 320 sub-communicators. In PetscGatherMessageLengths, each rank has about 1200 send+recv neighbors.

To suppress the overwhelming traffic, this MR adds two changes:

  • Skew the communication to let each rank send to ranks greater than itself first.
  • Introduce a new option -max_pending_isends, to control number of pending isends (i.e., isends unfinished with MPI_Waitall). Default is 512.

Since I only observed failures in PetscGatherMessageLengths() and PetscCommBuildTwoSided(), I changed them only and did not hunt for communications all over petsc code.

Edited by Junchao Zhang

Merge request reports