SYCL nbnxm: use readfirstlane AMD builtin (!3282) · Merge requests · GROMACS / GROMACS

Szilárd Páll requested to merge sz_nbnxm-sycl-use-readfirstlane-to-scalarize into main Nov 09, 2022

Use the readfirstlane AMD builtin to force a uniform load of exclusion index and interaction masks and with that avoiding vector registers and vector operations. Some recent ROCm compilers like v5.3 do optimize automatically the former but earlier don't, bu imask loads don't get optimized even more recent ROCm.

Observed performance improvements of up to 8% in interaction kernels and 5-15% on prune kernels on gfx90a; on older arch like gfx803 the latter improves by up to 30%.

Refs #3847 #3934

SYCL nbnxm: use readfirstlane AMD builtin

Merge request reports