Skip to content

CUDAFortran enabled UtilXlib

Pietro requested to merge QEF/q-e-gpu:mpicuda into develop

Introduction

This merge request provides CUDAFortran enabled subroutines for the message passing interfaces used in QE codes.

This is still a WIP but it's currently in a stage that may benefit from a reviewing process and a more general discussion about how to complete the merging process.

A list of the most relevant changes follows.

Bugfixes:

Changes:

  • COMMON statements have been removed from mp_base.f90 and a new data_buffer module has been introduced to replace them. The allocation and deallocation of the buffer spaces is done inside mp_start and mp_end. If mp_start is always called before using the other subroutines of the library, everything will work out of the box.
  • Added some intent(in) protection to mp_get and mp_put subroutines.

New features:

  • All mp_* interfaces have been expanded with subroutines accepting input and/or output arguments with the 'DEVICE' attribute (i.e. memory allocated on the accelerated device). The only missing functions are mp_bcast_z_gpu and mp_bcast_zv_gpu.
  • The pre-processor directive __CUDA can be used to enable the additional set of subroutines dealing with data residing on the GPU.
  • The pre-processor directive __GPU_MPI enables support for CUDAFortran aware MPI APIs (provided by PGI).
  • A simple system for unit-testing has been added to the library. At the time of writing it has high coverage of CUDAFortran subroutines, minimal coverage of the other subroutines.

To be done before merging

  • Checkout original .gitlab-ci.yml to comply with standard QE CI system. (done)
  • Add (at least) a README file to explain compilation options. (done)

To be done after merging

  • Change last call to mp_end in order to deallocate UtilXlib buffers.

To be decided

  • Documentation: almost absent now, should it be added before merging this? How? Ford (if it works)?
  • Should these changes go into the develop branch even if they do not provide any functional improvement to QE or should this merge request target a separate gpu-develop branch?
  • Testing system: should it be expanded with more/better coverage also of the CPU part?
  • Is the coding style acceptable?
  • Other desiderata for this merge request?

EDIT: reported updates from last commits. EDIT: new mp_end interface requires optional argument that should be added to the codes.

Edited by Pietro

Merge request reports