Fix OpenCL Gather kernel on AMD RDNA2
The original version was producing garbage on gfx1032 + ROCm 5.3.
Looks like a compiler bug, but I don't see a reason to do things the way they are done now; too convoluted. The new version is simpler.
Note 1: This does not fully enable RDNA. Fixes to NBNXM kernels are needed.
Note 2: Have not tested the performance impact on other platforms. As discussed offline, this change is unlikely to cause any problems.
Refs #4521
Edited by Andrey Alekseenko