Skip to content

[FATAL tini (7)] exec sh failed: No such file or directory with some images

Summary

I'm investigating with @niklasjanz a strange behavior with gitlab-runner 16.6.0 (seems to also appear on 16.7.0) since 2024/01/24 without much success.

I started to notice a strange behavior with some images.

Success image: nixpkgs/nix-flakes@sha256:9e926289a14133f44aad5e2063a3d76f6afc54e8967ecc75b40c8fdcc7b778aa

This image run fine on my runner.

Failing image: nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e

This image fails to load, it gives me:

[FATAL tini (7)] exec sh failed: No such file or directory
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 127

These 2 images should be extremely similar.

Now it looks like it's simple and sh is just not in the image or not in path. But that's not the case.

docker run -it nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e sh
sh-5.2# 

I also tried to run this docker run command directly on my runner instance, it works.

Now the strange part is, the issue does not seem to happen for everyone, the docker image was working fine for @niklasjanz .
I also found the docker image sometimes working fine at different times by targeting different nixpkgs versions (I'm building a GCE image with nix).
I also tried to run 16.7.0 with the same nixpkgs version as this failing runner (6723fa4e4f1a30d42a633bef5eb01caeb281adc3) (i.e same image, only thing that really changes is gitlab 16.7.0 instead of 16.6.0). And again, nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e failed, but this time the error is looking slightly different, it fails with:

Using docker image sha256:6e7bc7558e536d3c99e3f611960200e4055bfbfd84c186c0e664702a6b8ea943 for nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e with digest nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
exec /usr/bin/sh: no such file or directory

Again, /usr/bin/sh is there though.

But if I try to deploy from nixpkgs ef3a4d52ff55537a4b649a1c28c8c5abb0691461 (Dec 20), to my surprise, that failing image now works.
So then I tried to git bisect between the commit that works and the one that doesn't, but sadly I couldn't find the real commit that fixed the issue (the bisect ended on https://github.com/NixOS/nixpkgs/commit/856baf418ad4f1520855fab5e12e8cd4b881e774 , which seems completely unrelated)
I believe there is a bit of randomness involved in this issue, or at least something state-related. As one time the issue fixed itself after shutting down and starting up the instance again. This is likely why the git bisect was unsuccessful.

I found that I'm not alone. I found a similar issue with someone using GitHub runners, they can also docker run their image fine, but on the runner they get a no such file or directory: https://github.com/orgs/community/discussions/61493 Also found this one: https://forum.gitlab.com/t/gitlab-ci-images-not-finding-entrypoint-file-in-path/88949 and this one #1702 (closed)

Steps to reproduce

To reproduce, the best is to use Nix to create a GCE image:

flake.nix:

{
  description = "A basic flake with a shell";
  inputs.nixpkgs.url = "github:NixOS/nixpkgs/6723fa4e4f1a30d42a633bef5eb01caeb281adc3";
  inputs.flake-utils.url = "github:numtide/flake-utils";
  inputs.nix-glab-runner.url = "github:pcboy/nix-glab-runner";
  inputs.nix-glab-runner.inputs.nixpkgs.follows = "nixpkgs";

  outputs = {
    nixpkgs,
    flake-utils,
    nix-glab-runner,
    ...
  }:
    flake-utils.lib.eachDefaultSystem (system: let
      pkgs = nixpkgs.legacyPackages.${system};
      glabBuilders = nix-glab-runner.builders.${system};
    in {
      devShells.default = pkgs.mkShell {
        packages = [pkgs.bashInteractive pkgs.sops pkgs.google-cloud-sdk];
      };

      packages.uploadGceImage = glabBuilders.uploadGceImage {
        bucket = "your-gcs-bucket-for-GCE-image";
        gceImage = glabBuilders.gceImage {extraModules = [./extra_config.nix];};
        imagePrefixName = "nixos-image-gitlab-debug-23.11.x86_64-linux";
      };
    });
}

You have to change the bucket to point to a GCS bucket you have access to. There is a second file:

extra_config.nix:

{
  lib,
  pkgs,
  ...
}: {
  # Create a user for ssh access as standard IAM google login fails
  users.users.gitlab = {
    isNormalUser = true;
    extraGroups = ["wheel" "networkmanager"];
  };

  # Let listenfield user do `sudo`
  security.sudo.extraRules = [
    {
      users = ["gitlab"];
      commands = [
        {
          command = "ALL";
          options = ["NOPASSWD"];
        }
      ];
    }
  ];

  users.users.gitlab.openssh.authorizedKeys.keys = [
    "YOUR SSH PUBKEY"
  ];
  services.gitlab-runner.services.nix.tagList = lib.mkForce ["nix-runner-docker-debug"];
}

Then you just do:

nix run .\#uploadGceImage

Then to create your instance (change CI_SERVER_URL and token on first line):

echo 'printf "CI_SERVER_URL=https://gitlab.com\nREGISTRATION_TOKEN=glrt-TOKEN" > /etc/gitlab-runner-env' > startup_script.sh
gcloud compute instances create gitlab-runner-nix-debug \
    --project=network-setup-309306 \
    --zone=us-central1-a \
    --machine-type=e2-medium \
    --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=listenfield-us-central1-pri-net \
    --metadata=enable-oslogin=TRUE \
    --can-ip-forward \
    --service-account=$SERVICE_ACCOUNT \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --tags=allow-ssh \
    --create-disk=auto-delete=yes,boot=yes,device-name=gitlab-runner-nix,image=projects/$GCP_PROJECT/global/images/$IMAGE_NAME,mode=rw,size=100,type=projects/$GCP_PROJECT/zones/us-central1-a/diskTypes/pd-balanced \
    --labels=goog-ec-src=vm_add-gcloud \
    --reservation-affinity=any \
    --metadata-from-file startup-script=./startup_script.sh

Then you run the script as:

IMAGE_NAME="nixos-image-gitlab-debug-23-11-x86-64-linux" GCP_PROJECT="your-project" SERVICE_ACCOUNT="your@SA" ./create_instance.sh

If this is too much work to install nix, I can also just share a GCE image already built (this was already shared to @niklasjanz )

Here is a .gitlab-ci.yml made by @niklasjanz to easily reproduce:

.gitlab-ci.yml
stages:
  - reproduce

i-should-work:
  stage: reproduce
  image: nixpkgs/nix-flakes@sha256:9e926289a14133f44aad5e2063a3d76f6afc54e8967ecc75b40c8fdcc7b778aa
  parallel:
    matrix:
        - TAG: ["16.6", "16.7"]
  script:
    - echo "Hello world! 🙃"
  tags:
    - $TAG

i-should-fail:
  stage: reproduce
  image: nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e
  parallel:
    matrix:
        - TAG: ["16.6", "16.7"]
  script:
    - echo "Hello world! 🙃"
  tags:
    - $TAG

Actual behavior

The CI fails with:

[FATAL tini (7)] exec sh failed: No such file or directory

Expected behavior

It should be able to run sh, as it's in the image and in the PATH, and docker has no problem running this image.

Relevant logs and/or screenshots

job log
Running with gitlab-runner 16.6.0 (v16.6.0)
  on nix__220604e3c360 fekHFf4g7, system ID: s_34b1270b9f88
  feature flags: FF_NETWORK_PER_BUILD:true
Preparing the "docker" executor 00:01
Using Docker executor with image nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
Pulling docker image nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
Using docker image sha256:6e7bc7558e536d3c99e3f611960200e4055bfbfd84c186c0e664702a6b8ea943 for nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e with digest nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
Preparing environment 00:01
Running on runner-fekhff4g7-project-54290697-concurrent-0 via gitlab-runner-nix-debug.c.njanz-2844107e.internal...
Getting source from Git repository 00:01
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /builds/customer-issue-reproduction/zd-493788/.git/
Checking out c32029f1 as detached HEAD (ref is main)...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:6e7bc7558e536d3c99e3f611960200e4055bfbfd84c186c0e664702a6b8ea943 for nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e with digest nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
[FATAL tini (7)] exec sh failed: No such file or directory
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 127

Environment description

This is a giltab-runner custom running on GCP, image created with Nix.

docker info contents
Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /nix/store/zv7dn5kp6wndgr0cwdimy2vgn9x8b8y6-docker-plugins/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  2.23.1
    Path:     /nix/store/zv7dn5kp6wndgr0cwdimy2vgn9x8b8y6-docker-plugins/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 18
  Running: 10
  Paused: 0
  Stopped: 8
 Images: 17
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: journald
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: v1.7.11
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.74
 Operating System: NixOS 23.11 (Tapir)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 7.771GiB
 Name: gitlab-runner-nix.us-central1-a.c.network-setup-309306.internal
 ID: 6aaa9e7d-1258-4941-9c3b-1ae8f1bb474c
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true
config.toml contents
checkInterval = 30
concurrent = 4
[[runners]]
name = "nix__8c59ad2221e5"
url = "https://gitlab.com"
id = 29353640
token = "glrt-TOKEN"
token_obtained_at = "2024-01-23T04:17:14+00:00"
token_expires_at = "0001-01-01T00:00:00+00:00"
executor = "docker"
environment = ["DOCKER_CERT_PATH=/certs/client", "DOCKER_DRIVER=overlay2", "DOCKER_TLS_VERIFY=1", "ENV=/etc/profile", "FF_NETWORK_PER_BUILD=true", "NIX_REMOTE=daemon", "NIX_SSL_CERT_FILE=/nix/var/nix/profiles/default/etc/ssl/certs/ca-bundle.crt", "USER=root"]
pre_build_script = "/nix/store/2by5jjwnsk16vwaf5n1kvchk1v5lw7sg-setup-container"

[runners.cache]
MaxUploadedArchiveSize = 0

[runners.docker]
tls_verify = false
image = "alpine"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache", "/nix/store:/nix/store:ro", "/nix/var/nix/db:/nix/var/nix/db:ro", "/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro", "/var/run/docker.sock:/var/run/docker.sock"]
shm_size = 0
network_mtu = 0

Used GitLab Runner version

Version:      16.7.0
Git revision: v16.7.0
Git branch:   HEAD
GO version:   go1.20.12
Built:        unknown
OS/Arch:      linux/amd64

This is also happening with 16.6.0.

Possible fixes

At the moment, it seems to be crashing there: https://gitlab.com/gitlab-org/gitlab-runner/-/blob/v16.6.0/executors/docker/docker.go?ref_type=tags#L1218

Edited by David Hagege