Skip to content

BATCH_DOTPV improvement

Sebastian Ohlmann requested to merge dotpv_streams into develop

Description

Use several streams for DOTPV_BATCH in the CUDA Version. With this approach, the dot products with offsets can be effectively overlapped. This is also implemented for mesh_batch_nrm2.

Depends on !675 (merged).

Closes #226 (closed) #234 (closed) .

News snippet

Use streams for DOTPV_BATCH

Checklist

  • I have checked that my code follows the Octopus coding standards
  • I have added tests for all the new features added in this request.
Edited by Martin Lueders

Merge request reports