Improve performance of subspace iteration
Try to increase use of BLAS3-type operations and also reduce communication in parallel whenever possible.
Try to increase use of BLAS3-type operations and also reduce communication in parallel whenever possible.