SYCL: Add optimized atomicAdd flavor for AMD GPUs
AMD MI100 (gfx908) and MI200 (gfx90a) by default use CAS-loops for implementing floating-point atomics. Here, we call special flavors of atomicAdd that compile to native instructions, improving performance on these devices. AMD MI50 (gfx906) always uses CAS-loop.
Refs #4465 Refs #3935, #3965 (closed)
Edited by Andrey Alekseenko