Skip to content

SYCL: Enable native atomics for DPC++/CUDA

Andrey Alekseenko requested to merge aa-enable-atomics-dpcpp-cuda into hwe-release-2022

Refs https://github.com/intel/llvm/issues/5936

Testing on V100, with 384k water box and mid-January IntelLLVM. Shuffle-based reduction (!2571 (merged)) included.

Kernel runtime compared to CUDA-Clang (lower is better):

NB F PME NB FV PME NB F RF NB FV RF
Before +57% +1300% +131% +1900%
After +23% +44% +90% +137%

On smaller systems, the difference is less dramatic.

Merge request reports