Fix failing submatrix extraction tests
I found that the tests submat_colwise_sum_conv_to
and submat_rowwise_sum_conv_to
were failing with both backends. This turned out to be because the kernel was written to expect a two-dimensional worker grid, but it was being launched with a one-dimensional worker grid. Easy fix.
While I was in there, I noticed that the conv_to
tests were relying on specific overflow behavior, but overflows are undefined behavior. So I relaxed the tests so that they don't depend on undefined behavior.