Possible deadlock with tmpi and pin=auto - Redmine #2025
It is possible that bAllSet isn’t the same for all tmpi ranks at threadaffinity.cpp:525. This in turn than causes a deadlock because setting the affinity requires global communication. This can’t happen with lib-MPI because in that case MPI_Allreduce is used. I suspect that it isn’t the same for all ranks because the affinity gets changed by some threads (not sure whether by OpenMP/MPI or by GROMACS) while others test.
Not sure what the best solution is because tmpi isn’t initialized yet at that spot. Thus one cannot simply do a MPI_Barrier or MPI_Allreduce.
This is with ICC 17beta1 on KNL but should be possible to reproduce on other compiler/CPUs.
(from redmine: issue id 2025, created on 2016-08-08 by rolandschulz, closed on 2016-10-19)
- Changesets:
- Revision 82216120 by Berk Hess on 2016-08-10T05:50:23Z:
Fix deadlock with thread-MPI
With thread-MPI mdrun could deadlock while pinning threads.
Fixes #2025.
Change-Id: Ib42e9625134531b1e2f910b11339aa0f78b80624