BATCH_DOTPV improvement
Description
Use several streams for DOTPV_BATCH in the CUDA Version. With this approach, the dot products with offsets can be effectively overlapped. This is also implemented for mesh_batch_nrm2.
Depends on !675 (merged).
Closes #226 (closed) #234 (closed) .
News snippet
Use streams for DOTPV_BATCH
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.
Edited by Martin Lueders