Merged GPU code - Cublas wrappers spread around

CUDA Fortran implements Fortran interfaces to cublas libraries making it possible to use the same name for both CPU and GPU implementations.

Unfortunately this prevents type mismatches in subroutine calls that appears quite a lot in QE.

For this reason there are wrappers spread everywhere in the code:

UtilXlib/device_helper.f90
Modules/cuda_subroutines.f90
KS_Solvers/PPCG/generic_cublas.f90
...

Why did various developers replicate the same code over and over? Laziness maybe, but also the idea that different libraries/units should be independent from each other.

Still all this wrappers should be collected somewhere (and the appropriate place should be devxlib, although @fspiga will not like this)