zDOTPV_BATCH is inefficient for GPUs
We should rewrite this routine to reduce the cost. At the moment, the cost can be as high as 25% to the total compute time for time propagation using the Lanczos exponential method.
We should rewrite this routine to reduce the cost. At the moment, the cost can be as high as 25% to the total compute time for time propagation using the Lanczos exponential method.