Skip to content

Davidson ported with OpenACC, CPU and GPU versions merged into one

Ivan Carnimeo requested to merge icarnimeo/q-e:ks_acc into develop

NOTES:

  1. MYDDOT_VECTOR_GPU has been added in UtilXlib, in order to perform ddot inside gpu kernels using "vector" OpenACC level of parallelism

  2. four new subroutines (mp_sum_rm_nc, mp_sum_cm_nc, mp_sum_rm_nc_gpu, mp_sum_cm_nc_gpu, where "nc" means "non-contiguous") have been added to the mp_sum interface, in order to perform mp_sum on non-contiguous arrays: Call mp_sum(a(k1:k2,k3:k4), MPI_COMM) --> Call mp_sum(a, k1, k2, k3, k4, MPI_COMM) The new subroutines allocate a buffer (msg_buff) to pack the input array internally, whereas in the older version of cegterg_gpu it was necessary to allocate the buffer outside, before the call to mp_sum (see pinned_buffer in old cegterg_gpu), simplifying the code.

  3. Regarding the memory consumption (cfr mp_sum_cm_nc_gpu vs mp_sum_cm_gpu):

  • in the __GPU_MPI case, the overall GPU memory consumption is unchanged, because pinned_buffer allocation in cegterg_gpu has been just replaced with msg_buff allocation in mp_sum_cm_nc_gpu;
  • in the non __GPU_MPI case, the overall GPU memory consumption is now reduced, because pinned_buffer allocation is avoided outside mp_sum and msg_buff (mp_sum_cm_nc_gpu) is used in place of msg_h (mp_sum_cm_gpu);
  • the CPU case in cegterg has been protected with __CUDA, because in this case the regular mp_sum (mp_sum_cm_gpu) works well and there is no reason to call mp_sum_cm_nc_gpu and allocate msg_buff.
  1. note that only mp_sum_cm_nc_gpu is used (at the moment), the other routines (mp_sum_rm_nc, mp_sum_cm_nc, mp_sum_rm_nc_gpu) are included for completeness
Edited by Ivan Carnimeo

Merge request reports