Skip to content

Optimize linear_solver_batch for GPU (move pack/unpack and rewrite mesh_batch_nrm2)

Description

Move the pack/unpack operations up to linear_solver_solve_HXeY_batch. This should speed up calculations by removing unnecessary pack/unpack operations. Fruthermore, the mesh_batch_nrm2 routine has been optimized, replacing the state-by-state calls to the cublas_nrm2 calls by one kernel to perform the modulus-square on the whole batch, followed by a zgemv to perform the sumation over grid points.

News snippet

Speed up the linear_solver by reducing the number of pack/unpack operations and optimizing the calculation of the norm.

Checklist

  • I have checked that my code follows the Octopus coding standards
  • I have added tests for all the new features added in this request.

Closes #229 (closed)

Edited by Martin Lueders

Merge request reports