Martin Lueders requested to merge 229-move-pack-unpack-routines-in-the-linear_solver_batch into develop Oct 17, 2019

Description

Move the pack/unpack operations up to linear_solver_solve_HXeY_batch. This should speed up calculations by removing unnecessary pack/unpack operations. Fruthermore, the mesh_batch_nrm2 routine has been optimized, replacing the state-by-state calls to the cublas_nrm2 calls by one kernel to perform the modulus-square on the whole batch, followed by a zgemv to perform the sumation over grid points.

News snippet

Speed up the linear_solver by reducing the number of pack/unpack operations and optimizing the calculation of the norm.

Checklist

I have checked that my code follows the Octopus coding standards
I have added tests for all the new features added in this request.

Closes #229 (closed)

Edited Jul 13, 2020 by Martin Lueders

Optimize linear_solver_batch for GPU (move pack/unpack and rewrite mesh_batch_nrm2)

Description

News snippet

Checklist

Merge request reports