Skip to content

nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown")

Some time and for unknown reasons, the following error happens. It is an upstream problem where an error happens between the nvidia driver and probably the GPU, and thus not related to RadDeploy.

500 Server Error for http+docker://localhost/v1.45/containers/5488f3550d9c22723d8ee7b1c6c2d16ddf19610db2b4edfea101ba5c6b4982bf/start: Internal Server Error ("failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
consumer-1  | nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown")

In RadDeploy it will manifest it self as if the job was successfully started, status being running, but there will be a "runtime" in the dashboard. Such discoherent state should make one check the log for the error above.

The only solution I know of, is to reboot the computer, which sovled the issue every time I experienced it.