Draft: MGGA contribution to the Hamiltonian performed on the GPU
I want to run simulations using SCAN on Marconi100@CINECA.
Without this patch
cegterg : 1291.32s CPU 1307.31s WALL ( 720 calls)
h_psi : 1299.52s CPU 1310.02s WALL ( 2252 calls)
Called by h_psi:
h_psi:calbec : 0.32s CPU 0.94s WALL ( 2252 calls)
h_psi_meta : 1288.80s CPU 1293.19s WALL ( 2252 calls)
With this patch
cegterg : 58.12s CPU 75.03s WALL ( 720 calls)
h_psi : 45.85s CPU 57.45s WALL ( 2252 calls)
Called by h_psi:
h_psi:calbec : 0.36s CPU 1.01s WALL ( 2252 calls)
h_psi_meta : 38.77s CPU 48.29s WALL ( 2252 calls)
It's still a draft because
- It's not using batched FFTs, that would make it much faster,
- I didn't test the gamma case, but more importantly...
- it's yet another duplicated subroutine and yet another file, so maybe you want to wait for the OpenACC version and close this.