VkFFT backend for OpenCL/Metal
The VkFFT backend for GPU-accelerated FFTs was added to the OpenCL backend, despite OpenCL being deprecated. This allowed for PME offload on Apple silicon GPUs (#4615 (closed)), but also presents an opportunity to remove GROMACS's dependency on the unmaintained clFFT.
To-do list:
-
Use VkFFT to allow PME offload on M1-based SoCs (!3166 (merged)) -
Get VkFFT working on Intel Macs with Intel iGPUs and AMD dGPUs, through OpenCL or Metal -
Test the backend on Linux/Windows with Intel, AMD, and NVIDIA GPUs through the OpenCL backend -
Evaluate performance of VkFFT vs. clFFT on all hardware platforms -
If possible, remove the dependency on clFFT (example: !3162 (closed))
In addition, we should consider creating a Metal backend to the VkFFT library, which allows for optimal performance in a hypothetical hipSYCL Metal backend for Apple silicon. This would enable GPU-resident execution, where each time step's commands are executed inside one concurrent MTLComputeEncoder
, among other optimizations. If VkFFT works well on M1 through OpenCL, it should work correctly through Metal as well.
Either now (with OpenCL) or later (with hipSYCL Metal backend)
-
Determine whether on the M1 chip, a VkFFT Metal backend has greater GPU-side performance than OpenCL. This might happen because SIMD permute and reduction operations are now available. -
Profile performance impact of offloading PME to the GPU, with different variations of M1 (e.g. M1 vs. M1 Pro vs. M1 Max vs. M2). We don't know whether offloading harms performance on smaller Apple silicon GPUs.