Polaris Fixes (!107) · Merge requests · Institute for Advanced Study / High-Performance Computing / jmstone / Athena-Parthenon / AthenaK

This MR

Adds Kokkos::fence()s after the packing of buffers but before MPI sends.

On Polaris, crazy failure modes were arising presumably due to buffers not being packed prior to the MPI send being executed. For reasons unknown, this never showed up on Apollo. Maybe this is because interconnect on Apollo was slower than Polaris.

Adds various team_barrier()s throughout bvals.

I am sure many of the added barriers are overkill, but I have not noticed a huge performance hit with the additions.

Extends FOFC evaluation to the single layer of ghost cells surrounding a MeshBlock (hence addressing Issue #8 (closed)).

Prior to this change, a Komissarov blast wave test (b=1) would run to completion when using a single MeshBlock but fail when using multiple MeshBlocks (independent of the number of MPI ranks). This indicated that FOFC in ghost cells was likely necessary. This requires extending index ranges in flux calculations by one. An earlier, alternative attempt communicated FOFC flags across MeshBlocks, however, for unknown reasons, this severely hindered Polaris performance. We should still consider what we should be doing with FOFC at refinement boundaries.

Bumps Kokkos version.

For my normal workflow, the biggest change here is that MPI runs now take the argument --kokkos-map-device-id-by=mpi_rank. --kokkos-num-devices is now deprecated.

Cleanup to bvals.

Among the cleanup, this MR eliminates some warnings seen on Polaris and reported by @zappa regarding calling a __host__ function from a __host__ __device__ function. The prior design was likely harmless. This MR closes #14 (closed).

Cleanup torus.

Various changes to clean up torus in prep for Polaris calculations.

Edited Feb 10, 2023 by Patrick Mullen

Polaris Fixes

Merge request reports