Use pinned memory for wavefunction arrays in batches to speed up memory transfers
Normal fortran allocations allocate pageable memory, leading to much lower transfer speeds (only 5 GB/s instead of 12-13 GB/s). Thus, pinned memory should be used (allocating in C using cudaMallocHost