[FATAL tini (7)] exec sh failed: No such file or directory with some images
Summary
I'm investigating with @niklasjanz a strange behavior with gitlab-runner 16.6.0 (seems to also appear on 16.7.0) since 2024/01/24 without much success.
I started to notice a strange behavior with some images.
Success image: nixpkgs/nix-flakes@sha256:9e926289a14133f44aad5e2063a3d76f6afc54e8967ecc75b40c8fdcc7b778aa
This image run fine on my runner.
Failing image: nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e
This image fails to load, it gives me:
[FATAL tini (7)] exec sh failed: No such file or directory
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 127
These 2 images should be extremely similar.
Now it looks like it's simple and sh is just not in the image or not in path. But that's not the case.
docker run -it nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e sh
sh-5.2#
I also tried to run this docker run command directly on my runner instance, it works.
Now the strange part is, the issue does not seem to happen for everyone, the docker image was working fine for @niklasjanz .
I also found the docker image sometimes working fine at different times by targeting different nixpkgs versions (I'm building a GCE image with nix).
I also tried to run 16.7.0 with the same nixpkgs version as this failing runner (6723fa4e4f1a30d42a633bef5eb01caeb281adc3) (i.e same image, only thing that really changes is gitlab 16.7.0 instead of 16.6.0). And again, nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e failed, but this time the error is looking slightly different, it fails with:
Using docker image sha256:6e7bc7558e536d3c99e3f611960200e4055bfbfd84c186c0e664702a6b8ea943 for nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e with digest nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
exec /usr/bin/sh: no such file or directory
Again, /usr/bin/sh is there though.
But if I try to deploy from nixpkgs ef3a4d52ff55537a4b649a1c28c8c5abb0691461 (Dec 20), to my surprise, that failing image now works.
So then I tried to git bisect between the commit that works and the one that doesn't, but sadly I couldn't find the real commit that fixed the issue (the bisect ended on https://github.com/NixOS/nixpkgs/commit/856baf418ad4f1520855fab5e12e8cd4b881e774 , which seems completely unrelated)
I believe there is a bit of randomness involved in this issue, or at least something state-related. As one time the issue fixed itself after shutting down and starting up the instance again. This is likely why the git bisect was unsuccessful.
I found that I'm not alone. I found a similar issue with someone using GitHub runners, they can also docker run their image fine, but on the runner they get a no such file or directory: https://github.com/orgs/community/discussions/61493 Also found this one: https://forum.gitlab.com/t/gitlab-ci-images-not-finding-entrypoint-file-in-path/88949 and this one #1702 (closed)
Steps to reproduce
To reproduce, the best is to use Nix to create a GCE image:
flake.nix:
{
description = "A basic flake with a shell";
inputs.nixpkgs.url = "github:NixOS/nixpkgs/6723fa4e4f1a30d42a633bef5eb01caeb281adc3";
inputs.flake-utils.url = "github:numtide/flake-utils";
inputs.nix-glab-runner.url = "github:pcboy/nix-glab-runner";
inputs.nix-glab-runner.inputs.nixpkgs.follows = "nixpkgs";
outputs = {
nixpkgs,
flake-utils,
nix-glab-runner,
...
}:
flake-utils.lib.eachDefaultSystem (system: let
pkgs = nixpkgs.legacyPackages.${system};
glabBuilders = nix-glab-runner.builders.${system};
in {
devShells.default = pkgs.mkShell {
packages = [pkgs.bashInteractive pkgs.sops pkgs.google-cloud-sdk];
};
packages.uploadGceImage = glabBuilders.uploadGceImage {
bucket = "your-gcs-bucket-for-GCE-image";
gceImage = glabBuilders.gceImage {extraModules = [./extra_config.nix];};
imagePrefixName = "nixos-image-gitlab-debug-23.11.x86_64-linux";
};
});
}
You have to change the bucket to point to a GCS bucket you have access to. There is a second file:
extra_config.nix:
{
lib,
pkgs,
...
}: {
# Create a user for ssh access as standard IAM google login fails
users.users.gitlab = {
isNormalUser = true;
extraGroups = ["wheel" "networkmanager"];
};
# Let listenfield user do `sudo`
security.sudo.extraRules = [
{
users = ["gitlab"];
commands = [
{
command = "ALL";
options = ["NOPASSWD"];
}
];
}
];
users.users.gitlab.openssh.authorizedKeys.keys = [
"YOUR SSH PUBKEY"
];
services.gitlab-runner.services.nix.tagList = lib.mkForce ["nix-runner-docker-debug"];
}
Then you just do:
nix run .\#uploadGceImage
Then to create your instance (change CI_SERVER_URL and token on first line):
echo 'printf "CI_SERVER_URL=https://gitlab.com\nREGISTRATION_TOKEN=glrt-TOKEN" > /etc/gitlab-runner-env' > startup_script.sh
gcloud compute instances create gitlab-runner-nix-debug \
--project=network-setup-309306 \
--zone=us-central1-a \
--machine-type=e2-medium \
--network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=listenfield-us-central1-pri-net \
--metadata=enable-oslogin=TRUE \
--can-ip-forward \
--service-account=$SERVICE_ACCOUNT \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--tags=allow-ssh \
--create-disk=auto-delete=yes,boot=yes,device-name=gitlab-runner-nix,image=projects/$GCP_PROJECT/global/images/$IMAGE_NAME,mode=rw,size=100,type=projects/$GCP_PROJECT/zones/us-central1-a/diskTypes/pd-balanced \
--labels=goog-ec-src=vm_add-gcloud \
--reservation-affinity=any \
--metadata-from-file startup-script=./startup_script.sh
Then you run the script as:
IMAGE_NAME="nixos-image-gitlab-debug-23-11-x86-64-linux" GCP_PROJECT="your-project" SERVICE_ACCOUNT="your@SA" ./create_instance.sh
If this is too much work to install nix, I can also just share a GCE image already built (this was already shared to @niklasjanz )
Here is a .gitlab-ci.yml made by @niklasjanz to easily reproduce:
.gitlab-ci.yml
stages:
- reproduce
i-should-work:
stage: reproduce
image: nixpkgs/nix-flakes@sha256:9e926289a14133f44aad5e2063a3d76f6afc54e8967ecc75b40c8fdcc7b778aa
parallel:
matrix:
- TAG: ["16.6", "16.7"]
script:
- echo "Hello world! 🙃"
tags:
- $TAG
i-should-fail:
stage: reproduce
image: nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e
parallel:
matrix:
- TAG: ["16.6", "16.7"]
script:
- echo "Hello world! 🙃"
tags:
- $TAG
Actual behavior
The CI fails with:
[FATAL tini (7)] exec sh failed: No such file or directory
Expected behavior
It should be able to run sh, as it's in the image and in the PATH, and docker has no problem running this image.
Relevant logs and/or screenshots
job log
Running with gitlab-runner 16.6.0 (v16.6.0)
on nix__220604e3c360 fekHFf4g7, system ID: s_34b1270b9f88
feature flags: FF_NETWORK_PER_BUILD:true
Preparing the "docker" executor 00:01
Using Docker executor with image nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
Pulling docker image nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
Using docker image sha256:6e7bc7558e536d3c99e3f611960200e4055bfbfd84c186c0e664702a6b8ea943 for nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e with digest nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
Preparing environment 00:01
Running on runner-fekhff4g7-project-54290697-concurrent-0 via gitlab-runner-nix-debug.c.njanz-2844107e.internal...
Getting source from Git repository 00:01
Fetching changes with git depth set to 20...
Reinitialized existing Git repository in /builds/customer-issue-reproduction/zd-493788/.git/
Checking out c32029f1 as detached HEAD (ref is main)...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:6e7bc7558e536d3c99e3f611960200e4055bfbfd84c186c0e664702a6b8ea943 for nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e with digest nixpkgs/nix-flakes@sha256:bdb4fa240ee539fbe85850a63d3424208c3b4d86c186323474e9b52c5436434e ...
[FATAL tini (7)] exec sh failed: No such file or directory
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 127
Environment description
This is a giltab-runner custom running on GCP, image created with Nix.
docker info contents
Client:
Version: 24.0.5
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /nix/store/zv7dn5kp6wndgr0cwdimy2vgn9x8b8y6-docker-plugins/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: 2.23.1
Path: /nix/store/zv7dn5kp6wndgr0cwdimy2vgn9x8b8y6-docker-plugins/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 18
Running: 10
Paused: 0
Stopped: 8
Images: 17
Server Version: 24.0.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: journald
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: v1.7.11
runc version:
init version:
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.74
Operating System: NixOS 23.11 (Tapir)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.771GiB
Name: gitlab-runner-nix.us-central1-a.c.network-setup-309306.internal
ID: 6aaa9e7d-1258-4941-9c3b-1ae8f1bb474c
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
checkInterval = 30
concurrent = 4
[[runners]]
name = "nix__8c59ad2221e5"
url = "https://gitlab.com"
id = 29353640
token = "glrt-TOKEN"
token_obtained_at = "2024-01-23T04:17:14+00:00"
token_expires_at = "0001-01-01T00:00:00+00:00"
executor = "docker"
environment = ["DOCKER_CERT_PATH=/certs/client", "DOCKER_DRIVER=overlay2", "DOCKER_TLS_VERIFY=1", "ENV=/etc/profile", "FF_NETWORK_PER_BUILD=true", "NIX_REMOTE=daemon", "NIX_SSL_CERT_FILE=/nix/var/nix/profiles/default/etc/ssl/certs/ca-bundle.crt", "USER=root"]
pre_build_script = "/nix/store/2by5jjwnsk16vwaf5n1kvchk1v5lw7sg-setup-container"
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.docker]
tls_verify = false
image = "alpine"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache", "/nix/store:/nix/store:ro", "/nix/var/nix/db:/nix/var/nix/db:ro", "/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro", "/var/run/docker.sock:/var/run/docker.sock"]
shm_size = 0
network_mtu = 0
Used GitLab Runner version
Version: 16.7.0
Git revision: v16.7.0
Git branch: HEAD
GO version: go1.20.12
Built: unknown
OS/Arch: linux/amd64
This is also happening with 16.6.0.
Possible fixes
At the moment, it seems to be crashing there: https://gitlab.com/gitlab-org/gitlab-runner/-/blob/v16.6.0/executors/docker/docker.go?ref_type=tags#L1218