Skip to content

Fix PME-decomposition task assignment

Mark Abraham requested to merge pmeDecomposition_fixTaskAssignment into main

GPU force task assignment with multiple GPU tasks per rank is something of an edge case. When this code was written, the only relevant tricky case was offloading both PME and PP when using a single rank, when more than one device was detected. In that case, the former code ensured that only the first device is used by the two tasks.

Subsequently we added support for PME decomposition, for which the existing automated task assignment worked fine if there were at least as many tasks as GPUs detected. But if there were more devices detected than ranks we were missing the code to limit the devices to one per rank, causing an invalid throw.

This issue is most likely to have been seen testing with two ranks on a machine with 4 or 8 GPUs. In practice, if the number of devices detected was greater than the number ranks and not a multiple of the number of ranks, other logic forced the user to do a manual task assignment.

Real users are quite unlikely to have been affected as this code path requires setting multiple developer environment variables.

Fixes #4684 (closed)

Merge request reports