Draft: OpenMP5 offloading (WIP) (!1833) · Merge requests · QEF - Quantum ESPRESSO Foundation / q-e

Ivan Carnimeo requested to merge devel_omp5 into develop May 11, 2022

tests UPDATED to commit: adb9433b
(previously fully tested commit: 14dc80fa )

To be done:

more OMP offload in PW (Done - noncolin and gamma cases in vloc_psi.f90 fully offloaded, except tg; v_xc offloaded | h_psi offloaded (no real space) with inner data mapping ; calbec_k, calbec_gamma, calbec_nc offloaded with inner data mapping - to be tested on more than 1 gpu )
OMP offload in XClib (Done)
configure build system (now omp offload works by manually setting make.inc)
cmake build system
LAXlib
merge AMD libraries (Done)

To be fixed:

MPI + GPU with Cray
PPCG algorithm with Intel compiler on devcloud
ugly fix in Modules/Makefile: ifx compiler does not compile space_group.o and ifort must be used only for that file

Fixed:

Intel compiler compiles without -D__USE_DISPATCH, but then crashes at runtime;
(Done a temporary omp offloaded buffer psic_omp has been defined alongside the usual psic, for omp offloaded ffts)
find a workaround for omp dispatch with cray compiler (also GNU and NVHPC complain with dispatch)
(Done: dispatch directives have been protected and can be switched on with __USE_DISPATCH flag)
gfortran complains when finds map with data structures, e.g. !$omp target exit data map(delete:dfft%nl)
(Done those directives have been protected with __OPENMP_GPU)
crashes on stress calculation with nvfortran (on GPU) + MKL
(Done: There was a small bug in PW/src/gradutils.f90)
simpler OPENMP_GPU logics inside FFTXlib interfaces
(Done: Now FFTXlib interfaces are the same as the official develop)
clearer distinction between CUDA and OMP5 routines
(Done: Now _omp is appended to OMP routines and modules to distinguish them from CUDA _gpu)
bug fix: hpc-sdk (GPU) + mkl (CPU) (see Tests)
(Done: fft_scalar.DFTI.f90 has been restored to the official develop version, and a new file fft_scalar.DFTIOMP.f90 has been introduced specifically for OMP offloading)

Tests:

1. Intel software stack (GPU) + MKL

2. AMD software stack

3. GNU software stack (CPU) + MKL

Setup: GNU Fortran (GCC) 10.2.0 + MKL 2020.4.304
Hardware: local cluster with Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz + 2 GV100 GPUs
Compilation: OK
PW test-suite (4 mpi, 2 threads): 232/232 Passed

4. hpc-sdk software stack (CPU) + MKL

Setup: nvfortran 21.3-0 LLVM 64-bit target on x86-64 Linux -tp skylake + MKL 2020.4.304
Hardware: local cluster with Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz + 2 GV100 GPUs
Compilation: OK
PW test-suite (4 mpi, 2 threads): 232/232 Passed

5. hpc-sdk software stack (GPU) + MKL

Setup: nvfortran 21.3-0 LLVM 64-bit target on x86-64 Linux -tp skylake + MKL 2020.4.304
Hardware: local cluster with Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz + 2 GV100 GPUs
Compilation: OK
FFTXlib tests: All passed
PW test-suite (4 mpi, 2 threads, 2 gpu): 232/232 Passed

6. hpc-sdk software stack (CPU) on m100

7. hpc-sdk software stack (GPU) on m100

Edited Aug 18, 2022 by Laura Bellentani

Draft: OpenMP5 offloading (WIP)