CPU-GPU merger for Davidson diagonalization with scalapack in complex case

First step, GPU Davidson still very inefficient

Merge request reports

Loading