Make the mesh_to_cube mapping local
Description
This removes the need to store the mapping for the global grid which saves memory. Moreover, for the domain-parallel case, it is not necessary anymore to gather the global function on every core and to do the full mapping everywhere. Instead of using collective communication, the local mesh function and mapping is communicated pairwise and the mapping is then executed locally on each core. This yields slightly better performance on CPUs and much better performance on GPUs, especially when using CUDA-aware MPI.
In total, this makes the cube_to_mesh/mesh_to_cube routines less memory-hungry and is a further step towards removing access to global mesh functions.
News snippet
Make the mesh_to_cube mapping local
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.
Edited by Sebastian Ohlmann