Using Podman as Docker executor for GitLab runner results in GPU (driver) not being detected

Note: Issue initially posted here (GitLab forum).

Setup

  • Ubuntu Server 24.04 with kernel 6.8.0-1024-oracle

  • Nvidia driver 580.82.07

  • GitLab CE (self-hosted)

  • CUDA 13.0

  • Nvidia Container Toolkit 1.18.0 (latest)

  • GitLab runner

    Version:      18.5.0
    Git revision: bda84871
    Git branch:   18-5-stable
    GO version:   go1.24.6 X:cacheprog
    Built:        2025-10-13T19:20:30Z
    OS/Arch:      linux/amd64
  • Podman 4.9.3

  • CDI 0.5.0 (based on /etc/cdi/nvidia.yaml)

Issue

Using Podman as the Docker executor in a GitLab runner creates containers without GPU even though --gpu setting is present inside runner's configuration. Using official Nvidia CUDA cuDNN as well as PyTorch images results in warning that Nvidia driver cannot be detected inside the container.

Description

I am trying to configure a group runner with GPU support for PyTorch but also pure CUDA workloads. The runner has been successfully registered and CPU-only CI jobs work without any issues.

However, the GPU support appears to be missing. I started testing with a simple nvidia-smi call

stages:
  - checks

check_cuda:
  stage: checks
  tags:
    - gpu
    - linux
    - ml
  allow_failure: true
  variables:
    GIT_STRATEGY: none
  script:
   - nvidia-smi

which is what is recommended in Nvidia CTK's, GitLab's own as well as Podman's documentation on containers/runners with GPU support. As a base image I tried multiple versions of the official Nvidia Docker image with CUDA and cuDNN runtime as well as PyTorch with various CUDA and cuDNN versions and combinations of both.The

After the initial pulling (if required) and triggering the Docker executor (here actually Podman) the log shows the following warning (image used in the example below is nvcr.io/nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04):

Using effective pull policy of [always] for container nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04
Using docker image sha256:1fb7ebfe77ba724e8fdaf90c63f2dce7e42ec92afa2c48b3a5547812f645c377 for nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 with digest nvcr.io/nvidia/cuda@sha256:bcf8f5037535884fffbde1c1584af29e9eccc3f432d1cb05a5216a1184af12d8 ...
==========
== CUDA ==
==========
CUDA Version 12.9.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

The job then fails with nvidia-smi: command not found. As the GitLab article on GPU support states

If the hardware does not support a GPU, nvidia-smi should fail either because it’s missing or because it can’t communicate with the driver:

The GitLab's runner configuration at /etc/gitlab-runner/config.tom looks as follows:

concurrent = 8
check_interval = 0
connection_max_age = "15m0s"
shutdown_timeout = 0

log_level = "info"

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gpu-runner"
  url = "https://XXXXXXXXXXXXXXXXXx"
  id = 45874
  token = "glrt-YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
  token_obtained_at = 2025-10-29T15:29:11Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  environment = [
    "NVIDIA_DRIVER_CAPABILITIES=all"
  ]
  [runners.cache]
    MaxUploadedArchiveSize = 0
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    host = "unix:///run/podman/podman.sock"
    tls_verify = false
    #runtime = "nvidia"
    # image = "ubuntu:latest" # No nvidia-smi and CUDA
    # image = "nvidia/cuda:12.2.0-base-ubuntu22.04" # No cuDNN
    image = "nvidia/cuda:13.0.1-cudnn-runtime-ubuntu24.04"
    #image = "nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    network_mtu = 0
#    devices = [
#      "nvidia.com/gpu=0",
#      "nvidia.com/gpu=1",
#    ]
#    environment = [
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#gpu-enumeration
#      "NVIDIA_VISIBLE_DEVICES=all",
#      "NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all",
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html#driver-capabilities
#      "NVIDIA_DRIVER_CAPABILITIES=all"
#    ]
    gpus = "all"
#    gpus = "0"
#    gpus = "nvidia.com/gpu=all"
#    service_gpus = "all"
#    service_gpus = "0"
#    service_gpus = "nvidia.com/gpu=all"
    allowed_pull_policies = ["always", "if-not-present"]

As you can see I have tried multiple things. The very first option was to add gpu = "all", even though the GitLab's documentation on this states

gpus or service_gpus

but then continues by showing a configuration snippet containing both and not really explaining the difference. I only assume that service_gpus is for service containers and that having both is not a conflict even though I tried each on its own before I left both of them inside the configuration file. It did not resolve the issue. I then tried adding the environment variables recommended for Docker to make GPUs accessible. Still nothing. Changing images didn't help either. Adding the environmental variables to the CI job - same result.

The reason why I consider this to be an issue with the GitLab runner and not Podman itself is based on several observations, namely

  • Nvidia CTK generates the CDI specification for the underlying hardware

  • Nvidia CTK lists the hardware and clearly shows that the CDI specification is in place

    nvidia-ctk cdi list
    INFO[0000] Found 5 CDI devices                          
    nvidia.com/gpu=0
    nvidia.com/gpu=1
    nvidia.com/gpu=GPU-2aff26da-3664-9eeb-13ba-b78397cace6f
    nvidia.com/gpu=GPU-66878602-8286-6421-1ec4-8d097b71be4e
    nvidia.com/gpu=all
  • Podman containers, started manually, have full GPU support

    podman run --rm --device nvidia.com/gpu=0 --device nvidia.com/gpu=1     --security-opt=label=disable -it nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 nvidia-smi
    
    ==========
    == CUDA ==
    ==========
    
    CUDA Version 12.9.1
    
    Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    
    This container image and its contents are governed by the NVIDIA Deep Learning Container License.
    By pulling and using the container, you accept the terms and conditions of this license:
    https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
    
    A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
    
    Wed Nov  5 04:06:02 2025       
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
    +-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA H100 NVL                On  |   00000000:01:00.0 Off |                    0 |
    | N/A   43C    P0            205W /  400W |   49169MiB /  95830MiB |    100%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    |   1  NVIDIA H100 NVL                On  |   00000000:02:00.0 Off |                    0 |
    | N/A   31C    P0             61W /  400W |      17MiB /  95830MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+

I modified the CI job to use tail -f /dev/null in order to keep it running after I colleague of mine suggested that I enter the container created by the GitLab runner and compare with the one I have started (which works).

I found out that inside the GitLab runner's container all CUDA-related libraries and binaries (e.g. nvidia-smi) are missing. Below there is a listing of all the libraries found inside /usr/lib/x86_64-linux-gnu/:

e2fsprogs                libcap-ng.so.0                               libdb-5.3.so               libhogweed.so.6     libncursesw.so.6       libproc2.so.0.0.2      libsystemd.so.0.38.0
engines-3                libcap-ng.so.0.0.0                           libdebconfclient.so.0      libhogweed.so.6.8   libncursesw.so.6.4     libpsx.so.2            libtasn1.so.6
gconv                    libcap.so.2                                  libdebconfclient.so.0.0.0  libidn2.so.0        libnettle.so.8         libpsx.so.2.66         libtasn1.so.6.6.3
ld-linux-x86-64.so.2     libcap.so.2.66                               libdl.so.2                 libidn2.so.0.4.0    libnettle.so.8.8       libpthread.so.0        libthread_db.so.1
libBrokenLocale.so.1     libcom_err.so.2                              libdrop_ambient.so.0       libksba.so.8        libnpth.so.0           libreadline.so.8       libtic.so.6
libacl.so.1              libcom_err.so.2.1                            libdrop_ambient.so.0.0.0   libksba.so.8.14.6   libnpth.so.0.1.2       libreadline.so.8.2     libtic.so.6.4
libacl.so.1.1.2302       libcrypt.so.1                                libe2p.so.2                liblber.so.2        libnsl.so.1            libresolv.so.2         libtinfo.so.6
libanl.so.1              libcrypt.so.1.1.0                            libe2p.so.2.3              liblber.so.2.0.200  libnss_compat.so.2     librt.so.1             libtinfo.so.6.4
libapt-pkg.so.6.0        libcrypto.so.3                               libext2fs.so.2             libldap.so.2        libnss_dns.so.2        libsasl2.so.2          libudev.so.1
libapt-pkg.so.6.0.0      libcudnn.so.9                                libext2fs.so.2.4           libldap.so.2.0.200  libnss_files.so.2      libsasl2.so.2.0.25     libudev.so.1.7.8
libapt-private.so.0.0    libcudnn.so.9.10.2                           libffi.so.8                liblz4.so.1         libnss_hesiod.so.2     libseccomp.so.2        libunistring.so.5
libapt-private.so.0.0.0  libcudnn_adv.so.9                            libffi.so.8.1.4            liblz4.so.1.9.4     libp11-kit.so.0        libseccomp.so.2.5.5    libunistring.so.5.0.0
libassuan.so.0           libcudnn_adv.so.9.10.2                       libformw.so.6              liblzma.so.5        libp11-kit.so.0.3.1    libselinux.so.1        libutil.so.1
libassuan.so.0.8.6       libcudnn_cnn.so.9                            libformw.so.6.4            liblzma.so.5.4.5    libpam.so.0            libsemanage.so.2       libuuid.so.1
libattr.so.1             libcudnn_cnn.so.9.10.2                       libgcc_s.so.1              libm.so.6           libpam.so.0.85.1       libsepol.so.2          libuuid.so.1.3.0
libattr.so.1.1.2502      libcudnn_engines_precompiled.so.9            libgcrypt.so.20            libmd.so.0          libpam_misc.so.0       libsmartcols.so.1      libxxhash.so.0
libaudit.so.1            libcudnn_engines_precompiled.so.9.10.2       libgcrypt.so.20.4.3        libmd.so.0.1.0      libpam_misc.so.0.82.1  libsmartcols.so.1.1.0  libxxhash.so.0.8.2
libaudit.so.1.0.0        libcudnn_engines_runtime_compiled.so.9       libgmp.so.10               libmemusage.so      libpamc.so.0           libsqlite3.so.0        libz.so.1
libblkid.so.1            libcudnn_engines_runtime_compiled.so.9.10.2  libgmp.so.10.5.0           libmenuw.so.6       libpamc.so.0.82.1      libsqlite3.so.0.8.6    libz.so.1.3
libblkid.so.1.1.0        libcudnn_graph.so.9                          libgnutls.so.30            libmenuw.so.6.4     libpanelw.so.6         libss.so.2             libzstd.so.1
libbz2.so.1              libcudnn_graph.so.9.10.2                     libgnutls.so.30.37.1       libmount.so.1       libpanelw.so.6.4       libss.so.2.0           libzstd.so.1.5.5
libbz2.so.1.0            libcudnn_heuristic.so.9                      libgpg-error.so.0          libmount.so.1.1.0   libpcprofile.so        libssl.so.3            ossl-modules
libbz2.so.1.0.4          libcudnn_heuristic.so.9.10.2                 libgpg-error.so.0.34.0     libmvec.so.1        libpcre2-8.so.0        libstdc++.so.6         perl-base
libc.so.6                libcudnn_ops.so.9                            libhistory.so.8            libnccl.so.2        libpcre2-8.so.0.11.2   libstdc++.so.6.0.33    sasl2
libc_malloc_debug.so.0   libcudnn_ops.so.9.10.2                       libhistory.so.8.2          libnccl.so.2.27.3   libproc2.so.0          libsystemd.so.0        security

compared to the contents of the same directory using the same image but loaded manually

e2fsprogs                         libcudadebugger.so.580.82.07                 liblber.so.2                      libnvidia-glsi.so.580.82.07             libreadline.so.8.2
engines-3                         libcudnn.so.9                                liblber.so.2.0.200                libnvidia-glvkspirv.so.580.82.07        libresolv.so.2
gbm                               libcudnn.so.9.10.2                           libldap.so.2                      libnvidia-gpucomp.so.580.82.07          librt.so.1
gconv                             libcudnn_adv.so.9                            libldap.so.2.0.200                libnvidia-gtk2.so.580.82.07             libsasl2.so.2
ld-linux-x86-64.so.2              libcudnn_adv.so.9.10.2                       liblz4.so.1                       libnvidia-gtk3.so.580.82.07             libsasl2.so.2.0.25
libBrokenLocale.so.1              libcudnn_cnn.so.9                            liblz4.so.1.9.4                   libnvidia-ml.so.1                       libseccomp.so.2
libEGL_nvidia.so.0                libcudnn_cnn.so.9.10.2                       liblzma.so.5                      libnvidia-ml.so.580.82.07               libseccomp.so.2.5.5
libEGL_nvidia.so.580.82.07        libcudnn_engines_precompiled.so.9            liblzma.so.5.4.5                  libnvidia-ngx.so.1                      libselinux.so.1
libGLESv1_CM_nvidia.so.1          libcudnn_engines_precompiled.so.9.10.2       libm.so.6                         libnvidia-ngx.so.580.82.07              libsemanage.so.2
libGLESv1_CM_nvidia.so.580.82.07  libcudnn_engines_runtime_compiled.so.9       libmd.so.0                        libnvidia-nvvm.so.4                     libsepol.so.2
libGLESv2_nvidia.so.2             libcudnn_engines_runtime_compiled.so.9.10.2  libmd.so.0.1.0                    libnvidia-nvvm.so.580.82.07             libsmartcols.so.1
libGLESv2_nvidia.so.580.82.07     libcudnn_graph.so.9                          libmemusage.so                    libnvidia-opencl.so.1                   libsmartcols.so.1.1.0
libGLX_indirect.so.0              libcudnn_graph.so.9.10.2                     libmenuw.so.6                     libnvidia-opencl.so.580.82.07           libsqlite3.so.0
libGLX_nvidia.so.0                libcudnn_heuristic.so.9                      libmenuw.so.6.4                   libnvidia-opticalflow.so                libsqlite3.so.0.8.6
libGLX_nvidia.so.580.82.07        libcudnn_heuristic.so.9.10.2                 libmount.so.1                     libnvidia-opticalflow.so.1              libss.so.2
libacl.so.1                       libcudnn_ops.so.9                            libmount.so.1.1.0                 libnvidia-opticalflow.so.580.82.07      libss.so.2.0
libacl.so.1.1.2302                libcudnn_ops.so.9.10.2                       libmvec.so.1                      libnvidia-pkcs11-openssl3.so.580.82.07  libssl.so.3
libanl.so.1                       libdb-5.3.so                                 libnccl.so.2                      libnvidia-present.so.580.82.07          libstdc++.so.6
libapt-pkg.so.6.0                 libdebconfclient.so.0                        libnccl.so.2.27.3                 libnvidia-ptxjitcompiler.so.1           libstdc++.so.6.0.33
libapt-pkg.so.6.0.0               libdebconfclient.so.0.0.0                    libncursesw.so.6                  libnvidia-ptxjitcompiler.so.580.82.07   libsystemd.so.0
libapt-private.so.0.0             libdl.so.2                                   libncursesw.so.6.4                libnvidia-rtcore.so.580.82.07           libsystemd.so.0.38.0
libapt-private.so.0.0.0           libdrop_ambient.so.0                         libnettle.so.8                    libnvidia-sandboxutils.so.1             libtasn1.so.6
libassuan.so.0                    libdrop_ambient.so.0.0.0                     libnettle.so.8.8                  libnvidia-sandboxutils.so.580.82.07     libtasn1.so.6.6.3
libassuan.so.0.8.6                libe2p.so.2                                  libnpth.so.0                      libnvidia-tls.so.580.82.07              libthread_db.so.1
libattr.so.1                      libe2p.so.2.3                                libnpth.so.0.1.2                  libnvidia-vksc-core.so.1                libtic.so.6
libattr.so.1.1.2502               libext2fs.so.2                               libnsl.so.1                       libnvidia-vksc-core.so.580.82.07        libtic.so.6.4
libaudit.so.1                     libext2fs.so.2.4                             libnss_compat.so.2                libnvidia-wayland-client.so.580.82.07   libtinfo.so.6
libaudit.so.1.0.0                 libffi.so.8                                  libnss_dns.so.2                   libnvoptix.so.1                         libtinfo.so.6.4
libblkid.so.1                     libffi.so.8.1.4                              libnss_files.so.2                 libnvoptix.so.580.82.07                 libudev.so.1
libblkid.so.1.1.0                 libformw.so.6                                libnss_hesiod.so.2                libp11-kit.so.0                         libudev.so.1.7.8
libbz2.so.1                       libformw.so.6.4                              libnvcuvid.so                     libp11-kit.so.0.3.1                     libunistring.so.5
libbz2.so.1.0                     libgcc_s.so.1                                libnvcuvid.so.1                   libpam.so.0                             libunistring.so.5.0.0
libbz2.so.1.0.4                   libgcrypt.so.20                              libnvcuvid.so.580.82.07           libpam.so.0.85.1                        libutil.so.1
libc.so.6                         libgcrypt.so.20.4.3                          libnvidia-allocator.so.1          libpam_misc.so.0                        libuuid.so.1
libc_malloc_debug.so.0            libgmp.so.10                                 libnvidia-allocator.so.580.82.07  libpam_misc.so.0.82.1                   libuuid.so.1.3.0
libcap-ng.so.0                    libgmp.so.10.5.0                             libnvidia-cfg.so.1                libpamc.so.0                            libxxhash.so.0
libcap-ng.so.0.0.0                libgnutls.so.30                              libnvidia-cfg.so.580.82.07        libpamc.so.0.82.1                       libxxhash.so.0.8.2
libcap.so.2                       libgnutls.so.30.37.1                         libnvidia-egl-gbm.so.1            libpanelw.so.6                          libz.so.1
libcap.so.2.66                    libgpg-error.so.0                            libnvidia-egl-gbm.so.1.1.2        libpanelw.so.6.4                        libz.so.1.3
libcom_err.so.2                   libgpg-error.so.0.34.0                       libnvidia-egl-wayland.so.1        libpcprofile.so                         libzstd.so.1
libcom_err.so.2.1                 libhistory.so.8                              libnvidia-egl-wayland.so.1.1.19   libpcre2-8.so.0                         libzstd.so.1.5.5
libcrypt.so.1                     libhistory.so.8.2                            libnvidia-eglcore.so.580.82.07    libpcre2-8.so.0.11.2                    nvidia
libcrypt.so.1.1.0                 libhogweed.so.6                              libnvidia-encode.so               libproc2.so.0                           ossl-modules
libcrypto.so.3                    libhogweed.so.6.8                            libnvidia-encode.so.1             libproc2.so.0.0.2                       perl-base
libcuda.so                        libidn2.so.0                                 libnvidia-encode.so.580.82.07     libpsx.so.2                             sasl2
libcuda.so.1                      libidn2.so.0.4.0                             libnvidia-fbc.so.1                libpsx.so.2.66                          security
libcuda.so.580.82.07              libksba.so.8                                 libnvidia-fbc.so.580.82.07        libpthread.so.0                         vdpau
libcudadebugger.so.1              libksba.so.8.14.6                            libnvidia-glcore.so.580.82.07     libreadline.so.8

The difference between a CUDA setup and a cuDNN setup is that the CUDA environment comes with an actual installation, while usually cuDNN is simply downloaded from Nvidia's website and pasted inside the libraries directory, where the CUDA libraries are located. This means that the CUDA installation step in the image upon loading it inside a container via the GitLab runner fails.


Possible issue

Even though Podman is supposed to be a drop-in replacement for Docker and thus can be used (supposedly) interchangeably s an executor for the GitLab runner I suspect that the gpus setting, which is passed onto Podman via the runner is the problem here.

Podman, being a CDI tool uses a different flag, namely --device, as well as the CDI notation, e.g. nvidia.com/gpu=all,compared to Docker, which uses --gpus as well as it's own notation, e.g. all.

I tried somewhat reproducing the invocation arguments list of Podman by the runner manually (sadly I do not know how to get that information) and managed to reproduce the same warning when trying to run nvidia-smi:

podman run --rm --gpus all --security-opt=label=disable -it nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04 nvidia-smi

==========
== CUDA ==
==========

CUDA Version 12.9.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found

This clearly leads to the conclusion that Podman is not fully compatible with Docker as a GitLab runner executor when it comes to GPU support.

Edited by 🤖 GitLab Bot 🤖