Fix logic for two-dimensional CUDA grid dimensions
This fixes #33 (closed). Basically my math was wrong, and there never happened to be a test case that triggered the incorrect math, until the test case in #33 (closed) (which I added here too).
This fixes #33 (closed). Basically my math was wrong, and there never happened to be a test case that triggered the incorrect math, until the test case in #33 (closed) (which I added here too).