Skip to content

Writing up a kernel for DOTPV.

Nicolas Tancogne-Dejean requested to merge kernel_dotpv into main

Description

Writing up a kernel for DOTPV.

The kernel uses a block-wise tree reduction with a final warp reduction, and atomic operation for the interblock reduction.

This produces a relatively interesting speedup of a factor of 3 or more compared to multiple Blas calls, as may more reductions are needed.

Closes #563 (closed) .

News snippet

Improved performances for some GPU calculations.

Checklist

  • I have checked that my code follows the Octopus coding standards
  • I have added tests for all the new features added in this request.
Edited by Nicolas Tancogne-Dejean

Merge request reports