Writing up a kernel for DOTPV.
Description
Writing up a kernel for DOTPV.
The kernel uses a block-wise tree reduction with a final warp reduction, and atomic operation for the interblock reduction.
This produces a relatively interesting speedup of a factor of 3 or more compared to multiple Blas calls, as may more reductions are needed.
Closes #563 (closed) .
News snippet
Improved performances for some GPU calculations.
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.
Edited by Nicolas Tancogne-Dejean