Different warp size in AMD/ROCM

"Note that Nvidia and AMD devices have different warp sizes, so portable code should use the warpSize built-ins to query the warp size. Hipified code from the Cuda path requires careful review to ensure it doesn’t assume a waveSize of 32. “Wave-aware” code that assumes a waveSize of 32 will run on a wave-64 machine, but it will utilize only half of the machine resources." [1]

We need to check:

  • parallel reduction
  • CSR light kernels
  1. Warp Cross-Lane Functions
Edited by Tomas Oberhuber