Fix band group parallelism in DFPT+U
DFPT+U with band group parallelization was giving incorrect results. Here I fix them.
-
mp_sum
should be called overintra_bgrp_comm
instead ofintra_pool_comm
when the quantity is not band-distributed -
use_bgrp_in_hpsi
needs to be set to.false.
inlr_orthoUwfc
(as done inPW/orthoUwfc
) because it callsh_psi
ands_psi
. (Or,lr_orthoUwfc
should deal with band parallelization in computingbec
.) - There was a wrong factor in
zstar_eu_us.f90
(the same thing is computed on all cores, later mp_summed, so one divides by the number of cores. Since the quantity is distributed among bands, the factor should include onlyn_PW * n_pool
, notn_bgrp
Benchmark
Tested using test_suite/ph_U_insulator_PAW
, BN.phG.in
develop, -np 1 -nb 1
freq ( 1) = 1.503409 [THz] = 50.148314 [cm-1]
freq ( 2) = 1.503409 [THz] = 50.148314 [cm-1]
freq ( 3) = 1.826946 [THz] = 60.940368 [cm-1]
freq ( 4) = 25.447537 [THz] = 848.838470 [cm-1]
freq ( 5) = 40.996606 [THz] = 1367.499592 [cm-1]
freq ( 6) = 40.996606 [THz] = 1367.499592 [cm-1]
This MR, -np 1 -nb 1
freq ( 1) = 1.503409 [THz] = 50.148314 [cm-1]
freq ( 2) = 1.503409 [THz] = 50.148314 [cm-1]
freq ( 3) = 1.826946 [THz] = 60.940368 [cm-1]
freq ( 4) = 25.447537 [THz] = 848.838470 [cm-1]
freq ( 5) = 40.996606 [THz] = 1367.499592 [cm-1]
freq ( 6) = 40.996606 [THz] = 1367.499592 [cm-1]
develop, -np 3 -nb 3 (INCORRECT)
freq ( 1) = 14.810254 [THz] = 494.016888 [cm-1]
freq ( 2) = 14.810254 [THz] = 494.016888 [cm-1]
freq ( 3) = 20.489036 [THz] = 683.440683 [cm-1]
freq ( 4) = 50.867129 [THz] = 1696.744792 [cm-1]
freq ( 5) = 54.340698 [THz] = 1812.610592 [cm-1]
freq ( 6) = 54.340698 [THz] = 1812.610592 [cm-1]
This MR, -np 3 -nb 3
freq ( 1) = 1.503409 [THz] = 50.148314 [cm-1]
freq ( 2) = 1.503409 [THz] = 50.148314 [cm-1]
freq ( 3) = 1.826946 [THz] = 60.940368 [cm-1]
freq ( 4) = 25.447537 [THz] = 848.838470 [cm-1]
freq ( 5) = 40.996606 [THz] = 1367.499592 [cm-1]
freq ( 6) = 40.996606 [THz] = 1367.499592 [cm-1]
Edited by Jae-Mo Lihm