`fill()` failure on CUDA backend
Reported by @zoq; the following test program fails on the CUDA backend:
#include <bandicoot>
using namespace coot;
int main()
{
Mat<float> x(5, 100000);
x.fill(float(3));
}
This gives a CUDA_ERROR_INVALID_VALUE
code from cuLaunchKernel()
in cuda::fill()
. Almost certainly this is just the dimensions of the kernel being wrong; I think this will be an easy fix, I just want to write down the issue so it doesn't get lost.