Tiled matrix multiplication - non-multiple tile size
Use tutor's remarks to implement non-multiple tile size capability:
nice solution. :) To get it running with uneven matrices, you may change the following on your kernel call:
dim3 dimBlock(TILE_SIZE,TILE_SIZE);
dim3 dimGrid((n-1) / TILE_SIZE + 1, (n-1) / TILE_SIZE + 1);
And in the kernel you need to make sure that you don’t do out-of-bounds accesses or overwrite your memory. Essentially, make sure that you do nothing if
i >=n || k >= n.