Upstream AMD HIP port
Summary
Currently, AMD accelerator devices can be targeted using OpenCL for older devices, and AdaptiveCpp with the ROCm backend for more modern ones. AMD has its own fork of GROMACS that introduces a native HIP accelerator backend in addition to the existing CUDA and SYCL backends, which shows performance benefits compared to using AdaptiveCpp. To be able to achieve optimal performance on modern AMD devices, we should introduce their backend to the GROMACS mainline.
Use cases
Users running GROMACS on AMD based HPC centers (Dardel, LUMI, Frontier, ...) can achieve better performance for their simulations. Having the AMD port as part of mainline GROMACS will reduce confusion for users wanting to get optimal performance when it comes to deciding which version of GROMACS to use. Keeping the ability to use the AdaptiveCpp backend will make it easier to have a fully portable version of GROMACS that is backend agnostic.
Impact
Significant performance improvements when using AMD hardware.
Detailed description
To be able to port the current implementation, several steps need to be taken so that we don't end up with a multiple thousand line MR. Those are:
- Add CMake to detect HIP/ROCm toolchain during configuration
- Add build flags to correctly build application
- Set up CI tests for new backend
- Add abstraction layer for HIP backend
- Add Device detection and handling
- Add Shims for offload targets
- Add schedule handling code
- Add nbnxm kernels
- Add PME kernels
- Add bonded kernels
- Add constraint/integrator kernels
- Tie things together
Requirements
The patch series related to this Umbrella issue should be mostly self contained, but will likely unveil issues in schedule handling code on the way.
Links/references/implementations
The AMD fork with the HIP backend can be found here: https://github.com/ROCmSoftwarePlatform/Gromacs
I hope I can stay close to the implementation, but will not constrain myself to it when push comes to shove.