acc optimization fixes
Some optimization fixes related to OpenACC:
- the upper bounds of some loops are from derived types (mostly dfft). They have been replaced with scalar variables to avoid implicit copies by openacc (noticed by @bellenlau);
- a 'collapse' was missing in a loop in fft_helper_subroutines.f90;
- a new invfft call in sum_band (polaron case) wrapped with wave_g2r.