Sebastian Ohlmann requested to merge cublas_batched into develop Oct 17, 2019

Description

Use gemm_strided_batched to reduce the number of kernel launches to just one per batch instead of one per state. For the OpenCL version, keep the loop over gemm calls with offsets because there is no batched gemm call for OpenCL.

News snippet

Use batched cublas gemm calls for DOTPV_BATCH

Checklist

I have checked that my code follows the Octopus coding standards

Edited Oct 17, 2019 by Sebastian Ohlmann

WIP: Use batched cublas gemm calls for DOTPV_BATCH

Description

News snippet

Checklist

Merge request reports