Fix phase kernel launch parameters
Description
Split the number of grid points across the second and third dimension to avoid the limitation to 65335 grid points. The solution is similar to what is done in a lot of kernels in grid/batch_ops_inc.F90, e.g. axpy.
Checklist
-
I have checked that my code follows the Octopus coding standards