Draft: SYCL GPU Pairlist sorting (!4150) · Merge requests · GROMACS / GROMACS

Andrey Alekseenko requested to merge aa-4979-p2 into main Mar 05, 2024

Improve NBNXM kernel performance by making sure the work is more evenly distributed.

The approach closely follows CUDA / HIP approach.

sycl::popcount implementation added for AdaptiveCpp 23.10 and earlier.

Prefix sum inspired by oneDPL. They use sycl::joint_exclusive_scan over a single work-group for small inputs, and our current histogram size if small enough, so we just do the same instead of pulling in the whole library.

Still draft, need more correctness / performance checks.

Fixes #4979

Edited Apr 17, 2024 by Andrey Alekseenko

Admin message

Draft: SYCL GPU Pairlist sorting

Merge request reports