Some optimizations for batch operations and more
Description
Some optimizations for batch operations and more
- Fixes some unit tests and add a missing number of flops for the norm2.
- Implement the batch scalar scal operation, and using BLAS. This is much faster than the original version.
- Replace some internal batch code by BLAS calls.
- The timestep was too large in the exponential unit test, leading to NaNs for large number of applications of the exponential.
- Adding some profiling FLOPs data for the density.
- Change some inefficient OpenMP loop
News snippet
Code optimization.
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.