Skip to content

Fix OpenCL Gather kernel on AMD RDNA2

Andrey Alekseenko requested to merge aa-opencl-rdna2-ewald into main

The original version was producing garbage on gfx1032 + ROCm 5.3.

Looks like a compiler bug, but I don't see a reason to do things the way they are done now; too convoluted. The new version is simpler.

Note 1: This does not fully enable RDNA. Fixes to NBNXM kernels are needed.

Note 2: Have not tested the performance impact on other platforms. As discussed offline, this change is unlikely to cause any problems.

Refs #4521

Edited by Andrey Alekseenko

Merge request reports