Skip to content

SYCL nbnxm: use readfirstlane AMD builtin

Use the readfirstlane AMD builtin to force a uniform load of exclusion index and interaction masks and with that avoiding vector registers and vector operations. Some recent ROCm compilers like v5.3 do optimize automatically the former but earlier don't, bu imask loads don't get optimized even more recent ROCm.

Observed performance improvements of up to 8% in interaction kernels and 5-15% on prune kernels on gfx90a; on older arch like gfx803 the latter improves by up to 30%.

Refs #3847 #3934

Merge request reports