CUDAFortran enabled UtilXlib
Introduction
This merge request provides CUDAFortran enabled subroutines for the message passing interfaces used in QE codes.
This is still a WIP but it's currently in a stage that may benefit from a reviewing process and a more general discussion about how to complete the merging process.
A list of the most relevant changes follows.
Bugfixes:
- this merge request fixes issue #16 (closed)
Changes:
-
COMMON
statements have been removed frommp_base.f90
and a newdata_buffer
module has been introduced to replace them. The allocation and deallocation of the buffer spaces is done insidemp_start
andmp_end
. Ifmp_start
is always called before using the other subroutines of the library, everything will work out of the box. - Added some
intent(in)
protection tomp_get
andmp_put
subroutines.
New features:
- All
mp_*
interfaces have been expanded with subroutines accepting input and/or output arguments with the 'DEVICE' attribute (i.e. memory allocated on the accelerated device). The only missing functions aremp_bcast_z_gpu
andmp_bcast_zv_gpu
. - The pre-processor directive
__CUDA
can be used to enable the additional set of subroutines dealing with data residing on the GPU. - The pre-processor directive
__GPU_MPI
enables support for CUDAFortran aware MPI APIs (provided by PGI). - A simple system for unit-testing has been added to the library. At the time of writing it has high coverage of CUDAFortran subroutines, minimal coverage of the other subroutines.
To be done before merging
-
Checkout original(done).gitlab-ci.yml
to comply with standard QE CI system. -
Add (at least) a README file to explain compilation options.(done)
To be done after merging
- Change last call to
mp_end
in order to deallocate UtilXlib buffers.
To be decided
- Documentation: almost absent now, should it be added before merging this? How? Ford (if it works)?
- Should these changes go into the develop branch even if they do not provide any functional improvement to QE or should this merge request target a separate
gpu-develop
branch? - Testing system: should it be expanded with more/better coverage also of the CPU part?
- Is the coding style acceptable?
- Other desiderata for this merge request?
EDIT: reported updates from last commits. EDIT: new mp_end interface requires optional argument that should be added to the codes.
Edited by Pietro