Reorganize CI to test on more GPUs and CUDA versions

My goal is to test:

  • with different CUDA toolkits
  • with/without C++23 enabled (because fp16 is defined differently and other behavior may differ)
  • with OpenCL on non-nvidia GPUs (preferably AMD/Intel)

We will see if I can manage to do all of those things without exploding the build time too much.

Merge request reports

Loading