CUDAFortran enabled UtilXlib (!24) · Merge requests · QEF - Quantum ESPRESSO Foundation / q-e

Pietro requested to merge QEF/q-e-gpu:mpicuda into develop Mar 08, 2018

Introduction

This merge request provides CUDAFortran enabled subroutines for the message passing interfaces used in QE codes.

This is still a WIP but it's currently in a stage that may benefit from a reviewing process and a more general discussion about how to complete the merging process.

A list of the most relevant changes follows.

Bugfixes:

this merge request fixes issue #16 (closed)

Changes:

COMMON statements have been removed from mp_base.f90 and a new data_buffer module has been introduced to replace them. The allocation and deallocation of the buffer spaces is done inside mp_start and mp_end. If mp_start is always called before using the other subroutines of the library, everything will work out of the box.
Added some intent(in) protection to mp_get and mp_put subroutines.

New features:

All mp_* interfaces have been expanded with subroutines accepting input and/or output arguments with the 'DEVICE' attribute (i.e. memory allocated on the accelerated device). The only missing functions are mp_bcast_z_gpu and mp_bcast_zv_gpu.
The pre-processor directive __CUDA can be used to enable the additional set of subroutines dealing with data residing on the GPU.
The pre-processor directive __GPU_MPI enables support for CUDAFortran aware MPI APIs (provided by PGI).
A simple system for unit-testing has been added to the library. At the time of writing it has high coverage of CUDAFortran subroutines, minimal coverage of the other subroutines.

To be done before merging

~~Checkout original .gitlab-ci.yml to comply with standard QE CI system.~~ (done)
~~Add (at least) a README file to explain compilation options.~~ (done)

To be done after merging

Change last call to mp_end in order to deallocate UtilXlib buffers.

To be decided

Documentation: almost absent now, should it be added before merging this? How? Ford (if it works)?
Should these changes go into the develop branch even if they do not provide any functional improvement to QE or should this merge request target a separate gpu-develop branch?
Testing system: should it be expanded with more/better coverage also of the CPU part?
Is the coding style acceptable?
Other desiderata for this merge request?

EDIT: reported updates from last commits. EDIT: new mp_end interface requires optional argument that should be added to the codes.

Edited Jan 10, 2019 by Pietro

CUDAFortran enabled UtilXlib

Introduction

To be done before merging

To be done after merging

To be decided

Merge request reports