Use batched cublas call in DOTPV_BATCH

Use gemm_strided_batched to reduce the number of kernel launches to just
one per batch instead of one per state. For the OpenCL version, keep the
loop over gemm calls with offsets because there is no batched gemm call
for OpenCL.
Status Job ID Name Coverage
  External
passed codecov/patch

86.66667%
passed codecov/project

70.8225%
passed distcheck

00:06:29

passed foss-2018a

00:14:43

passed foss-2018a_debug

00:19:51

passed foss-2018a_min

00:14:14

passed foss-2018a_mpi

00:16:00

passed foss-2018a_mpi_debug

00:20:44

passed foss-2018a_mpi_min

00:15:05

passed foss-2018a_mpi_opt

00:16:25

passed foss-2018a_opt

00:17:30

passed foss-2018a_ppc

00:18:41

passed foss-2018a_ppc_mpi

00:14:29

passed foss-2018a_valgrind

00:40:24

passed foss-2018b

00:14:28

passed foss-2018b_mpi

00:14:17

passed foss-2019a

00:14:57

passed foss-2019a_mpi

00:16:07

passed fosscuda-2018a

00:26:04

passed fosscuda-2018a_mpi

00:44:55

passed intel-2018a

00:16:59

passed intel-2018a_impi

00:17:07

passed intel-2018a_impi_omp

00:25:03

passed intel-2018a_omp

00:18:18

passed intel-2018b

00:17:02

passed intel-2018b_impi

00:16:24

passed intel-2019a

00:16:38

passed intel-2019a_impi

00:16:25

passed pgi-2017.10

00:20:29

passed pgi-2017.10_mpi

00:20:31

passed tests

02:32:11

failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
failed codecov/patch

7.69231%
passed codecov/patch

86.66667%
passed codecov/patch

86.66667%
passed codecov/patch

86.66667%
failed codecov/project

69.56311%
failed codecov/project

69.33707%
failed codecov/project

66.46252%
passed codecov/project

70.8225%
failed codecov/project

66.4301%
passed codecov/project

70.8225%
failed codecov/project

69.34638%
failed codecov/project

69.51046%
failed codecov/project

69.50615%
passed codecov/project

70.66339%
failed codecov/project

69.56311%
failed codecov/project

69.56311%
failed fosscuda-2018a_mpi

00:44:32