Skip to content

Breakup projector kernels

Description

This merge request introduces more parallelism in the projector_bra and projector_bra_phase kernels, leading to better utilization of the GPU. In particular, the loop over points of the projector submesh is now striding with the step size of the warp_size, followed be a reduction across the threads within the warp. These reductions are very efficient.

News snippet

This merge request introduces more parallelism in the projector_bra and projector_bra_phase kernels, leading to better utilization of the GPU.

Checklist

  • I have checked that my code follows the Octopus coding standards
  • I have added tests for all the new features added in this request.

Closes #222 (closed).

Edited by Martin Lueders

Merge request reports

Loading