WIP: Resolve "Improve GPU performance of DOTPV_BATCH"
Description
The GPU version of DOTPV_BATCH has been rewritten in analogy to the MESH_BATCH_NRM2 routine. This avoids repeated calls to the accel_dot kernel, and performs all dot propdutcs within a batch simultaneously, using a cuBLAS GEMV call and a new kernel performing the pointwise multiplication of two batches.
News snippet
Improvement of the GPU performance of DOTPV_BATCH, by rewriting the method in terms of less kernel calls.
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.
Closes #234 (closed)
Edited by Martin Lueders