Adapt CUDA code to use all available GPUs on a node
Use all GPUs on a node in the CUDA backend in a round-robin fashion with the MPI ranks, similar to the OpenCL backend. Before, only the first device was used by all ranks on a node.
Use all GPUs on a node with the CUDA backend.
- I have checked that my code follows the Octopus coding standards