Breakup projector kernels
Description
This merge request introduces more parallelism in the projector_bra and projector_bra_phase kernels, leading to better utilization of the GPU. In particular, the loop over points of the projector submesh is now striding with the step size of the warp_size, followed be a reduction across the threads within the warp. These reductions are very efficient.
News snippet
This merge request introduces more parallelism in the projector_bra and projector_bra_phase kernels, leading to better utilization of the GPU.
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.
Closes #222 (closed).
Edited by Martin Lueders