Skip to content

WIP: Resolve "Improve GPU performance of DOTPV_BATCH"

Description

The GPU version of DOTPV_BATCH has been rewritten in analogy to the MESH_BATCH_NRM2 routine. This avoids repeated calls to the accel_dot kernel, and performs all dot propdutcs within a batch simultaneously, using a cuBLAS GEMV call and a new kernel performing the pointwise multiplication of two batches.

News snippet

Improvement of the GPU performance of DOTPV_BATCH, by rewriting the method in terms of less kernel calls.

Checklist

  • I have checked that my code follows the Octopus coding standards
  • I have added tests for all the new features added in this request.

Closes #234 (closed)

Edited by Martin Lueders

Merge request reports