Use CUBIN generation instead of PTX for newer CUDA
Description
This is supported from CUDA 11.1 on, see also https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#dynamic-code-generation
It avoids incompatiblities between the PTX versions of the runtime library that we link against and the driver. This can happenif the runtime library is newer than the driver on the system.
News snippet
Use CUBIN generation instead of PTX for newer CUDA versions
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.