SYCL: Enable native atomics for DPC++/CUDA (!2612) · Merge requests · GROMACS / GROMACS

Testing on V100, with 384k water box and mid-January IntelLLVM. Shuffle-based reduction (!2571 (merged)) included.

Kernel runtime compared to CUDA-Clang (lower is better):

	NB F PME	NB FV PME	NB F RF	NB FV RF
Before	+57%	+1300%	+131%	+1900%
After	+23%	+44%	+90%	+137%

On smaller systems, the difference is less dramatic.

SYCL: Enable native atomics for DPC++/CUDA