WIP: Throttle pending isends
It is not a bug fix. It is an awkward workaround for MPI failures under overwhelming traffic.
In this email thread, petsc user Randall Mackie reported a failure in PetscGatherMessageLengths(), tested with MPICH, Intel MPI 2018/2019 and OpenMPI. From our investigation, it seems the failure was due to overwhelming traffic.
In one test with 5120 ranks, a global vector on a DMDA is copied to global vectors of DMDAs on 320 sub-communicators. In PetscGatherMessageLengths, each rank has about 1200 send+recv neighbors.
To suppress the overwhelming traffic, this MR adds two changes:
- Skew the communication to let each rank send to ranks greater than itself first.
- Introduce a new option -max_pending_isends, to control number of pending isends (i.e., isends unfinished with MPI_Waitall). Default is 512.
Since I only observed failures in PetscGatherMessageLengths() and PetscCommBuildTwoSided(), I changed them only and did not hunt for communications all over petsc code.
Edited by Junchao Zhang