Skip to content

SYCL nbnxm: use AMD DPP intrinsic for j reduction

Szilárd Páll requested to merge sz_sycl-use-amd-dpp-for-reduction into main

Improves nbnxm kernel performance by up to 10%/6% (F/VF) on gfx90a and 5-6% on gfx908.

DPP update-based shuffle function is added to the SYCL kernel utils so it can be reused elsewhere.

Refs #3847 #3934

Edited by Szilárd Páll

Merge request reports