Use CUDA-aware MPI
Description
Instead of copying the boundary from the device to the host, use device pointers so the CUDA-aware MPI implementation can copy the date directly between the devices. Also overlap the transfer with the computation of the inner points to minimize the waiting time.
Using this feature can be enabled at configure time (--enable-cudampi
) and at runtime with a variable in the input file (CudaAwareMPI
).
This MR also fixes a synchronization bug in subarray_gather
, where a synchronization was missing.
News snippet
Use CUDA-aware MPI
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.
Closes #224 (closed) and #162 (closed).
Edited by Martin Lueders