Skip to content

WIP: Use batched cublas gemm calls for DOTPV_BATCH

Sebastian Ohlmann requested to merge cublas_batched into develop

Description

Use gemm_strided_batched to reduce the number of kernel launches to just one per batch instead of one per state. For the OpenCL version, keep the loop over gemm calls with offsets because there is no batched gemm call for OpenCL.

News snippet

Use batched cublas gemm calls for DOTPV_BATCH

Checklist

Edited by Sebastian Ohlmann

Merge request reports