Specify -cl-std in kernel compilation, and explicitly enable subgroup extension if on OpenCL < 2.1.
I finally got my AMD GPU drivers set up correctly with OpenCL, and found that Bandicoot did not successfully build unless I explicitly specified the OpenCL version. And, since the subgroup extension is not a core part of OpenCL until OpenCL 2.1, I also needed to add a #define
to enable it to the generated OpenCL source.
I tested that this works with nvidia, AMD, Intel GPUs, and a Macbook M1 system.
Will merge tomorrow as part of a release.