Adapt CUDA code to use all available GPUs on a node
Description
Use all GPUs on a node in the CUDA backend in a round-robin fashion with the MPI ranks, similar to the OpenCL backend. Before, only the first device was used by all ranks on a node.
News snippet
Use all GPUs on a node with the CUDA backend.
Checklist
-
I have checked that my code follows the Octopus coding standards