cuda: run device detection in temporary driver context; handle compute modes
- Create a temporary context using CUDA Driver API
- Execute device sanity checks inside that context and destroy it after detection
- Add CUDA Driver API error reporting and checking utilities
- Branch on device compute mode during detection:
- EXCLUSIVE_PROCESS: use runtime cudaSetDevice() to reuse the existing primary context instead of creating a second one
- PROHIBITED: skip probing and mark device Unavailable
- Map context-creation failures to DeviceStatus::Unavailable rather than NonFunctional where the device is inaccessible but functional
This avoids tearing down the default context preventing issues if an external library or application is using it independently. It also fixes incorrect NonFunctional status for devices that are unavailable due to compute mode restrictions.
Edited by Szilárd Páll