CUDA + RMM-DIS causes `Illegal memory access` in test-suite when k != GAMMA

I've been trying to compile QE for GPU using:

  • HPC-SDK 23.7 (both using nv... and pg... compilers)
  • CUDA 12.1.1
  • OpenMPI 4.1.5 (Tested both with and without)

and running on a system with a A100 GPU.

All tests in the pw test-suite are passing, but the ones that uses rmm diagonalization and involves k-points other than GAMMA, in particular:

  • system--pw_noncolin--noncolin-rmm
  • system--pw_scf--scf-rmm-k
  • system--pw_scf--scf-rmm-paro-k

fail after RMM-DIIS diagonalization appears for the first time in the output with the error

cudaMemcpy returned status 700: an illegal memory access was encountered

The GAMMA only counterparts do pass:

  • system--pw_scf--scf-rmm-gamma
  • system--pw_scf--scf-rmm-paro-gamma

I've seen from other issues that hpc-sdk with QE can be finicky, but since the failures seems to appear in a specific section of the code i thought this might be worth investigating.