Using Podman as Docker executor for GitLab runner results in GPU (driver) not being detected
Note: Issue initially posted here (GitLab forum).
Setup
-
Ubuntu Server 24.04 with kernel
6.8.0-1024-oracle -
Nvidia driver 580.82.07
-
GitLab CE (self-hosted)
-
CUDA 13.0
-
Nvidia Container Toolkit 1.18.0 (latest)
-
GitLab runner
Version: 18.5.0 Git revision: bda84871 Git branch: 18-5-stable GO version: go1.24.6 X:cacheprog Built: 2025-10-13T19:20:30Z OS/Arch: linux/amd64 -
Podman 4.9.3
-
CDI 0.5.0 (based on
/etc/cdi/nvidia.yaml)
Issue
Using Podman as the Docker executor in a GitLab runner creates containers without GPU even though --gpu setting is present inside runner's configuration. Using official Nvidia CUDA cuDNN as well as PyTorch images results in warning that Nvidia driver cannot be detected inside the container.
Description
I am trying to configure a group runner with GPU support for PyTorch but also pure CUDA workloads. The runner has been successfully registered and CPU-only CI jobs work without any issues.
However, the GPU support appears to be missing. I started testing with a simple nvidia-smi call
stages:
- checks
check_cuda:
stage: checks
tags:
- gpu
- linux
- ml
allow_failure: true
variables:
GIT_STRATEGY: none
script:
- nvidia-smi
which is what is recommended in Nvidia CTK's, GitLab's own as well as Podman's documentation on containers/runners with GPU support. As a base image I tried multiple versions of the official Nvidia Docker image with CUDA and cuDNN runtime as well as PyTorch with various CUDA and cuDNN versions and combinations of both.The
After the initial pulling (if required) and triggering the Docker executor (here actually Podman) the log shows the following warning (image used in the example below is nvcr.io/nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04):
Using effective pull policy of [always] for container nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04
Using docker image sha256:1fb7ebfe77ba724e8fdaf90c63f2dce7e42ec92afa2c48b3a5547812f645c377 for nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 with digest nvcr.io/nvidia/cuda@sha256:bcf8f5037535884fffbde1c1584af29e9eccc3f432d1cb05a5216a1184af12d8 ...
==========
== CUDA ==
==========
CUDA Version 12.9.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
The job then fails with nvidia-smi: command not found. As the GitLab article on GPU support states
If the hardware does not support a GPU,
nvidia-smishould fail either because it’s missing or because it can’t communicate with the driver:
The GitLab's runner configuration at /etc/gitlab-runner/config.tom looks as follows:
concurrent = 8
check_interval = 0
connection_max_age = "15m0s"
shutdown_timeout = 0
log_level = "info"
[session_server]
session_timeout = 1800
[[runners]]
name = "gpu-runner"
url = "https://XXXXXXXXXXXXXXXXXx"
id = 45874
token = "glrt-YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
token_obtained_at = 2025-10-29T15:29:11Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker"
environment = [
"NVIDIA_DRIVER_CAPABILITIES=all"
]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
host = "unix:///run/podman/podman.sock"
tls_verify = false
#runtime = "nvidia"
# image = "ubuntu:latest" # No nvidia-smi and CUDA
# image = "nvidia/cuda:12.2.0-base-ubuntu22.04" # No cuDNN
image = "nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04"
#image = "nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
network_mtu = 0
# devices = [
# "nvidia.com/gpu=0",
# "nvidia.com/gpu=1",
# ]
# environment = [
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#gpu-enumeration
# "NVIDIA_VISIBLE_DEVICES=all",
# "NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all",
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#driver-capabilities
# "NVIDIA_DRIVER_CAPABILITIES=all"
# ]
gpus = "all"
# gpus = "0"
# gpus = "nvidia.com/gpu=all"
# service_gpus = "all"
# service_gpus = "0"
# service_gpus = "nvidia.com/gpu=all"
allowed_pull_policies = ["always", "if-not-present"]
As you can see I have tried multiple things. The very first option was to add gpu = "all", even though the GitLab's documentation on this states
gpusorservice_gpus
but then continues by showing a configuration snippet containing both and not really explaining the difference. I only assume that service_gpus is for service containers and that having both is not a conflict even though I tried each on its own before I left both of them inside the configuration file. It did not resolve the issue. I then tried adding the environment variables recommended for Docker to make GPUs accessible. Still nothing. Changing images didn't help either. Adding the environmental variables to the CI job - same result.
The reason why I consider this to be an issue with the GitLab runner and not Podman itself is based on several observations, namely
-
Nvidia CTK generates the CDI specification for the underlying hardware
-
Nvidia CTK lists the hardware and clearly shows that the CDI specification is in place
nvidia-ctk cdi list INFO[0000] Found 5 CDI devices nvidia.com/gpu=0 nvidia.com/gpu=1 nvidia.com/gpu=GPU-2aff26da-3664-9eeb-13ba-b78397cace6f nvidia.com/gpu=GPU-66878602-8286-6421-1ec4-8d097b71be4e nvidia.com/gpu=all -
Podman containers, started manually, have full GPU support
podman run --rm --device nvidia.com/gpu=0 --device nvidia.com/gpu=1 --security-opt=label=disable -it nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 nvidia-smi ========== == CUDA == ========== CUDA Version 12.9.1 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. Wed Nov 5 04:06:02 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H100 NVL On | 00000000:01:00.0 Off | 0 | | N/A 43C P0 205W / 400W | 49169MiB / 95830MiB | 100% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H100 NVL On | 00000000:02:00.0 Off | 0 | | N/A 31C P0 61W / 400W | 17MiB / 95830MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
I modified the CI job to use tail -f /dev/null in order to keep it running after I colleague of mine suggested that I enter the container created by the GitLab runner and compare with the one I have started (which works).
I found out that inside the GitLab runner's container all CUDA-related libraries and binaries (e.g. nvidia-smi) are missing. Below there is a listing of all the libraries found inside /usr/lib/x86_64-linux-gnu/:
e2fsprogs libcap-ng.so.0 libdb-5.3.so libhogweed.so.6 libncursesw.so.6 libproc2.so.0.0.2 libsystemd.so.0.38.0
engines-3 libcap-ng.so.0.0.0 libdebconfclient.so.0 libhogweed.so.6.8 libncursesw.so.6.4 libpsx.so.2 libtasn1.so.6
gconv libcap.so.2 libdebconfclient.so.0.0.0 libidn2.so.0 libnettle.so.8 libpsx.so.2.66 libtasn1.so.6.6.3
ld-linux-x86-64.so.2 libcap.so.2.66 libdl.so.2 libidn2.so.0.4.0 libnettle.so.8.8 libpthread.so.0 libthread_db.so.1
libBrokenLocale.so.1 libcom_err.so.2 libdrop_ambient.so.0 libksba.so.8 libnpth.so.0 libreadline.so.8 libtic.so.6
libacl.so.1 libcom_err.so.2.1 libdrop_ambient.so.0.0.0 libksba.so.8.14.6 libnpth.so.0.1.2 libreadline.so.8.2 libtic.so.6.4
libacl.so.1.1.2302 libcrypt.so.1 libe2p.so.2 liblber.so.2 libnsl.so.1 libresolv.so.2 libtinfo.so.6
libanl.so.1 libcrypt.so.1.1.0 libe2p.so.2.3 liblber.so.2.0.200 libnss_compat.so.2 librt.so.1 libtinfo.so.6.4
libapt-pkg.so.6.0 libcrypto.so.3 libext2fs.so.2 libldap.so.2 libnss_dns.so.2 libsasl2.so.2 libudev.so.1
libapt-pkg.so.6.0.0 libcudnn.so.9 libext2fs.so.2.4 libldap.so.2.0.200 libnss_files.so.2 libsasl2.so.2.0.25 libudev.so.1.7.8
libapt-private.so.0.0 libcudnn.so.9.10.2 libffi.so.8 liblz4.so.1 libnss_hesiod.so.2 libseccomp.so.2 libunistring.so.5
libapt-private.so.0.0.0 libcudnn_adv.so.9 libffi.so.8.1.4 liblz4.so.1.9.4 libp11-kit.so.0 libseccomp.so.2.5.5 libunistring.so.5.0.0
libassuan.so.0 libcudnn_adv.so.9.10.2 libformw.so.6 liblzma.so.5 libp11-kit.so.0.3.1 libselinux.so.1 libutil.so.1
libassuan.so.0.8.6 libcudnn_cnn.so.9 libformw.so.6.4 liblzma.so.5.4.5 libpam.so.0 libsemanage.so.2 libuuid.so.1
libattr.so.1 libcudnn_cnn.so.9.10.2 libgcc_s.so.1 libm.so.6 libpam.so.0.85.1 libsepol.so.2 libuuid.so.1.3.0
libattr.so.1.1.2502 libcudnn_engines_precompiled.so.9 libgcrypt.so.20 libmd.so.0 libpam_misc.so.0 libsmartcols.so.1 libxxhash.so.0
libaudit.so.1 libcudnn_engines_precompiled.so.9.10.2 libgcrypt.so.20.4.3 libmd.so.0.1.0 libpam_misc.so.0.82.1 libsmartcols.so.1.1.0 libxxhash.so.0.8.2
libaudit.so.1.0.0 libcudnn_engines_runtime_compiled.so.9 libgmp.so.10 libmemusage.so libpamc.so.0 libsqlite3.so.0 libz.so.1
libblkid.so.1 libcudnn_engines_runtime_compiled.so.9.10.2 libgmp.so.10.5.0 libmenuw.so.6 libpamc.so.0.82.1 libsqlite3.so.0.8.6 libz.so.1.3
libblkid.so.1.1.0 libcudnn_graph.so.9 libgnutls.so.30 libmenuw.so.6.4 libpanelw.so.6 libss.so.2 libzstd.so.1
libbz2.so.1 libcudnn_graph.so.9.10.2 libgnutls.so.30.37.1 libmount.so.1 libpanelw.so.6.4 libss.so.2.0 libzstd.so.1.5.5
libbz2.so.1.0 libcudnn_heuristic.so.9 libgpg-error.so.0 libmount.so.1.1.0 libpcprofile.so libssl.so.3 ossl-modules
libbz2.so.1.0.4 libcudnn_heuristic.so.9.10.2 libgpg-error.so.0.34.0 libmvec.so.1 libpcre2-8.so.0 libstdc++.so.6 perl-base
libc.so.6 libcudnn_ops.so.9 libhistory.so.8 libnccl.so.2 libpcre2-8.so.0.11.2 libstdc++.so.6.0.33 sasl2
libc_malloc_debug.so.0 libcudnn_ops.so.9.10.2 libhistory.so.8.2 libnccl.so.2.27.3 libproc2.so.0 libsystemd.so.0 security
compared to the contents of the same directory using the same image but loaded manually
e2fsprogs libcudadebugger.so.580.82.07 liblber.so.2 libnvidia-glsi.so.580.82.07 libreadline.so.8.2
engines-3 libcudnn.so.9 liblber.so.2.0.200 libnvidia-glvkspirv.so.580.82.07 libresolv.so.2
gbm libcudnn.so.9.10.2 libldap.so.2 libnvidia-gpucomp.so.580.82.07 librt.so.1
gconv libcudnn_adv.so.9 libldap.so.2.0.200 libnvidia-gtk2.so.580.82.07 libsasl2.so.2
ld-linux-x86-64.so.2 libcudnn_adv.so.9.10.2 liblz4.so.1 libnvidia-gtk3.so.580.82.07 libsasl2.so.2.0.25
libBrokenLocale.so.1 libcudnn_cnn.so.9 liblz4.so.1.9.4 libnvidia-ml.so.1 libseccomp.so.2
libEGL_nvidia.so.0 libcudnn_cnn.so.9.10.2 liblzma.so.5 libnvidia-ml.so.580.82.07 libseccomp.so.2.5.5
libEGL_nvidia.so.580.82.07 libcudnn_engines_precompiled.so.9 liblzma.so.5.4.5 libnvidia-ngx.so.1 libselinux.so.1
libGLESv1_CM_nvidia.so.1 libcudnn_engines_precompiled.so.9.10.2 libm.so.6 libnvidia-ngx.so.580.82.07 libsemanage.so.2
libGLESv1_CM_nvidia.so.580.82.07 libcudnn_engines_runtime_compiled.so.9 libmd.so.0 libnvidia-nvvm.so.4 libsepol.so.2
libGLESv2_nvidia.so.2 libcudnn_engines_runtime_compiled.so.9.10.2 libmd.so.0.1.0 libnvidia-nvvm.so.580.82.07 libsmartcols.so.1
libGLESv2_nvidia.so.580.82.07 libcudnn_graph.so.9 libmemusage.so libnvidia-opencl.so.1 libsmartcols.so.1.1.0
libGLX_indirect.so.0 libcudnn_graph.so.9.10.2 libmenuw.so.6 libnvidia-opencl.so.580.82.07 libsqlite3.so.0
libGLX_nvidia.so.0 libcudnn_heuristic.so.9 libmenuw.so.6.4 libnvidia-opticalflow.so libsqlite3.so.0.8.6
libGLX_nvidia.so.580.82.07 libcudnn_heuristic.so.9.10.2 libmount.so.1 libnvidia-opticalflow.so.1 libss.so.2
libacl.so.1 libcudnn_ops.so.9 libmount.so.1.1.0 libnvidia-opticalflow.so.580.82.07 libss.so.2.0
libacl.so.1.1.2302 libcudnn_ops.so.9.10.2 libmvec.so.1 libnvidia-pkcs11-openssl3.so.580.82.07 libssl.so.3
libanl.so.1 libdb-5.3.so libnccl.so.2 libnvidia-present.so.580.82.07 libstdc++.so.6
libapt-pkg.so.6.0 libdebconfclient.so.0 libnccl.so.2.27.3 libnvidia-ptxjitcompiler.so.1 libstdc++.so.6.0.33
libapt-pkg.so.6.0.0 libdebconfclient.so.0.0.0 libncursesw.so.6 libnvidia-ptxjitcompiler.so.580.82.07 libsystemd.so.0
libapt-private.so.0.0 libdl.so.2 libncursesw.so.6.4 libnvidia-rtcore.so.580.82.07 libsystemd.so.0.38.0
libapt-private.so.0.0.0 libdrop_ambient.so.0 libnettle.so.8 libnvidia-sandboxutils.so.1 libtasn1.so.6
libassuan.so.0 libdrop_ambient.so.0.0.0 libnettle.so.8.8 libnvidia-sandboxutils.so.580.82.07 libtasn1.so.6.6.3
libassuan.so.0.8.6 libe2p.so.2 libnpth.so.0 libnvidia-tls.so.580.82.07 libthread_db.so.1
libattr.so.1 libe2p.so.2.3 libnpth.so.0.1.2 libnvidia-vksc-core.so.1 libtic.so.6
libattr.so.1.1.2502 libext2fs.so.2 libnsl.so.1 libnvidia-vksc-core.so.580.82.07 libtic.so.6.4
libaudit.so.1 libext2fs.so.2.4 libnss_compat.so.2 libnvidia-wayland-client.so.580.82.07 libtinfo.so.6
libaudit.so.1.0.0 libffi.so.8 libnss_dns.so.2 libnvoptix.so.1 libtinfo.so.6.4
libblkid.so.1 libffi.so.8.1.4 libnss_files.so.2 libnvoptix.so.580.82.07 libudev.so.1
libblkid.so.1.1.0 libformw.so.6 libnss_hesiod.so.2 libp11-kit.so.0 libudev.so.1.7.8
libbz2.so.1 libformw.so.6.4 libnvcuvid.so libp11-kit.so.0.3.1 libunistring.so.5
libbz2.so.1.0 libgcc_s.so.1 libnvcuvid.so.1 libpam.so.0 libunistring.so.5.0.0
libbz2.so.1.0.4 libgcrypt.so.20 libnvcuvid.so.580.82.07 libpam.so.0.85.1 libutil.so.1
libc.so.6 libgcrypt.so.20.4.3 libnvidia-allocator.so.1 libpam_misc.so.0 libuuid.so.1
libc_malloc_debug.so.0 libgmp.so.10 libnvidia-allocator.so.580.82.07 libpam_misc.so.0.82.1 libuuid.so.1.3.0
libcap-ng.so.0 libgmp.so.10.5.0 libnvidia-cfg.so.1 libpamc.so.0 libxxhash.so.0
libcap-ng.so.0.0.0 libgnutls.so.30 libnvidia-cfg.so.580.82.07 libpamc.so.0.82.1 libxxhash.so.0.8.2
libcap.so.2 libgnutls.so.30.37.1 libnvidia-egl-gbm.so.1 libpanelw.so.6 libz.so.1
libcap.so.2.66 libgpg-error.so.0 libnvidia-egl-gbm.so.1.1.2 libpanelw.so.6.4 libz.so.1.3
libcom_err.so.2 libgpg-error.so.0.34.0 libnvidia-egl-wayland.so.1 libpcprofile.so libzstd.so.1
libcom_err.so.2.1 libhistory.so.8 libnvidia-egl-wayland.so.1.1.19 libpcre2-8.so.0 libzstd.so.1.5.5
libcrypt.so.1 libhistory.so.8.2 libnvidia-eglcore.so.580.82.07 libpcre2-8.so.0.11.2 nvidia
libcrypt.so.1.1.0 libhogweed.so.6 libnvidia-encode.so libproc2.so.0 ossl-modules
libcrypto.so.3 libhogweed.so.6.8 libnvidia-encode.so.1 libproc2.so.0.0.2 perl-base
libcuda.so libidn2.so.0 libnvidia-encode.so.580.82.07 libpsx.so.2 sasl2
libcuda.so.1 libidn2.so.0.4.0 libnvidia-fbc.so.1 libpsx.so.2.66 security
libcuda.so.580.82.07 libksba.so.8 libnvidia-fbc.so.580.82.07 libpthread.so.0 vdpau
libcudadebugger.so.1 libksba.so.8.14.6 libnvidia-glcore.so.580.82.07 libreadline.so.8
The difference between a CUDA setup and a cuDNN setup is that the CUDA environment comes with an actual installation, while usually cuDNN is simply downloaded from Nvidia's website and pasted inside the libraries directory, where the CUDA libraries are located. This means that the CUDA installation step in the image upon loading it inside a container via the GitLab runner fails.
Possible issue
Even though Podman is supposed to be a drop-in replacement for Docker and thus can be used (supposedly) interchangeably s an executor for the GitLab runner I suspect that the gpus setting, which is passed onto Podman via the runner is the problem here.
Podman, being a CDI tool uses a different flag, namely --device, as well as the CDI notation, e.g. nvidia.com/gpu=all,compared to Docker, which uses --gpus as well as it's own notation, e.g. all.
I tried somewhat reproducing the invocation arguments list of Podman by the runner manually (sadly I do not know how to get that information) and managed to reproduce the same warning when trying to run nvidia-smi:
podman run --rm --gpus all --security-opt=label=disable -it nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 nvidia-smi
==========
== CUDA ==
==========
CUDA Version 12.9.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found
This clearly leads to the conclusion that Podman is not fully compatible with Docker as a GitLab runner executor when it comes to GPU support.