Skip to content

Draft: SYCL GPU Pairlist sorting

Andrey Alekseenko requested to merge aa-4979-p2 into main

Improve NBNXM kernel performance by making sure the work is more evenly distributed.

The approach closely follows CUDA / HIP approach.

sycl::popcount implementation added for AdaptiveCpp 23.10 and earlier.

Prefix sum inspired by oneDPL. They use sycl::joint_exclusive_scan over a single work-group for small inputs, and our current histogram size if small enough, so we just do the same instead of pulling in the whole library.

Still draft, need more correctness / performance checks.

Fixes #4979

Edited by Andrey Alekseenko

Merge request reports