Martin Lueders requested to merge 234-improve-gpu-performance-of-dotpv_batch into develop Oct 29, 2019

Description

The GPU version of DOTPV_BATCH has been rewritten in analogy to the MESH_BATCH_NRM2 routine. This avoids repeated calls to the accel_dot kernel, and performs all dot propdutcs within a batch simultaneously, using a cuBLAS GEMV call and a new kernel performing the pointwise multiplication of two batches.

News snippet

Improvement of the GPU performance of DOTPV_BATCH, by rewriting the method in terms of less kernel calls.

Checklist

I have checked that my code follows the Octopus coding standards
I have added tests for all the new features added in this request.

Closes #234 (closed)

Edited Nov 11, 2019 by Martin Lueders

WIP: Resolve "Improve GPU performance of DOTPV_BATCH"

Description

News snippet

Checklist

Merge request reports