HIP: set default StatesBlockSize to warp size
Description
On AMD GPUs the warp size is commonly 64 threads, except in RDNA architectures which can utilize a warp size of 32 or 64 respectively.
https://rocm.docs.amd.com/projects/HIP/en/latest/understand/hardware_implementation.html
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.