Skip to content

realm: Reduce in flight broadcasts to avoid overloading network in allgatherv

Elliott Slaughter requested to merge realm-gasnetex-allgatherv into master

The currently implementation of allgatherv is built in terms of broadcast. Unfortunately, even though the broadcast implementation is itself optimized, allgatherv is still inefficient and can potentially flood the network with a large number of messages. This is a problem on the Slingshot 11 network, where the network falls over under heavy load. Therefore, as a workaround, limit the number of broadcasts going into the network simultaneously so that we can limit that load. The resulting allgatherv will be slower, but since this is initialization we probably don't care too much.

A slower version of this (that blocks on every broadcast instead of every 16) has already been tested on Frontier and is sufficient to scale up to 8192 nodes, so I think this should be fine for all practical use cases.

Note: current version of this MR has now been validated out to 4096 nodes (8192 ranks).

Note: this has now been merged into !1229 and will be closed once that merges.

Edited by Elliott Slaughter

Merge request reports