Kaniko multistage builds failing due to missing files
Summary
Gitlab version: 14.5.2 Kaniko version: 1.7.0
In order to build Docker containers for use in AWS, we're using the design document specifying how to run autoscaling CI on Fargate: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws_fargate/index.html
Our Dockerfile for the CI builder is derived from one of the official examples, adding only some scripts, more tools etc.: https://gitlab.com/aws-fargate-driver-demo/docker-kaniko-gitlab-ci-fargate/-/blob/master/alpine/Dockerfile
When building multistage Dockerfiles, Kaniko changes the files present inside the container during the first step. All subsequent steps have no access to the repository and all commands except the Kaniko-provided ones are missing after the executor finishes.
This is likely not an issue with Kaniko itself - it makes sense as far as I understand its mechanism but there are no issue reports to be found on this. For something that fundamental, that is not realistic.
Steps to reproduce
- For the builder, the
Dockerfileis found here: https://gitlab.com/-/snippets/2254911 - Its
docker-entrypoint.shis found here: https://gitlab.com/-/snippets/2254912 - Its
config.json.tplis this:
{
"auths": {
"${CI_REGISTRY}": {"auth": "${CI_LOGIN}"}
},
"credHelpers": {
"${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com": "ecr-login"
}
}
- Build this image and make it available as the CI coordinator as per the guide. You can likely just strip out all the ECR-related stuff but I'm leaving it in for completeness.
- Create a new repo with this
.gitlab-ci.ymlin the Gitlab instance set up to use Fargate for its builds:
Build:
stage: build
before_script:
- source /context
- export CI_LOGIN="$(printf "%s:%s"
"${CI_REGISTRY_USER}" "${CI_REGISTRY_PASSWORD}"
| base64 | tr -d '\n')"
- ls -la
- envsubst < /root/.docker/config.json.tpl > /root/.docker/config.json
script:
- executor --context "dir://${CI_PROJECT_DIR}"
--dockerfile Dockerfile
--destination "${CI_REGISTRY_IMAGE}:latest" || true
- ls -la || true
- echo **
- echo ${SHELL}
- echo /**/*
- Create a
Dockerfilein the repo root:
FROM scratch AS c1
COPY ./f1 /f1
FROM scratch as c2
COPY --from=c1 /f1 /f1
COPY ./f2 /f2
touch f1 f2- Commit and let the build run.
Example Project
Not applicable.
What is the current bug behavior?
This is the important part of the Kaniko output:
INFO[0000] Unpacking rootfs as cmd COPY ./f1 /f1 requires it.
INFO[0000] COPY ./f1 /f1
INFO[0000] Taking snapshot of files...
INFO[0000] Saving file f1 for later use
INFO[0000] Deleting filesystem...
INFO[0000] No base image, nothing to extract
INFO[0000] Executing 0 build triggers
INFO[0000] Unpacking rootfs as cmd COPY --from=c1 /f1 /f1 requires it.
INFO[0000] COPY --from=c1 /f1 /f1
INFO[0000] Taking snapshot of files...
error building image: error building stage: failed to get files used from context: failed to get fileinfo for /opt/gitlab-runner/builds/infrastructure/bug-report-multistage/f2: lstat /opt/gitlab-runner/builds/infrastructure/bug-report-multistage/f2: no such file or directory
The first ls succeeds, but the second execution fails. Huge parts of the container are missing, but the pipeline is still running in bash. ** is not expanded, so the repo is gone.
$ ls -la || true
bash: line 130: /bin/ls: No such file or directory
$ echo **
**
$ echo ${SHELL}
/bin/bash
$ echo /**/*
/dev/core /dev/fd /dev/full /dev/mqueue /dev/null /dev/ptmx /dev/pts /dev/random /dev/shm /dev/stderr /dev/stdin /dev/stdout /dev/tty /dev/urandom /dev/zero /etc/hostname /etc/hosts /etc/mtab /etc/resolv.conf /kaniko/0 /kaniko/457978159 /kaniko/903179970 /kaniko/Dockerfile /kaniko/docker-credential-acr /kaniko/docker-credential-ecr-login /kaniko/docker-credential-gcr /kaniko/executor /kaniko/ssl /proc/1 /proc/105 /proc/107 /proc/109 /proc/39 /proc/9 /proc/acpi /proc/buddyinfo /proc/bus /proc/cgroups /proc/cmdline /proc/consoles /proc/cpuinfo /proc/crypto /proc/devices /proc/diskstats /proc/dma /proc/driver /proc/execdomains /proc/filesystems /proc/fs /proc/interrupts /proc/iomem /proc/ioports /proc/irq /proc/kallsyms /proc/kcore /proc/key-users /proc/keys /proc/kmsg /proc/kpagecgroup /proc/kpagecount /proc/kpageflags /proc/latency_stats /proc/loadavg /proc/locks /proc/mdstat /proc/meminfo /proc/misc /proc/modules /proc/mounts /proc/mtrr /proc/net /proc/pagetypeinfo /proc/partitions /proc/sched_debug /proc/schedstat /proc/scsi /proc/self /proc/slabinfo /proc/softirqs /proc/stat /proc/swaps /proc/sys /proc/sysrq-trigger /proc/sysvipc /proc/thread-self /proc/timer_list /proc/tty /proc/uptime /proc/version /proc/vmallocinfo /proc/vmstat /proc/xen /proc/zoneinfo /sys/block /sys/bus /sys/class /sys/dev /sys/devices /sys/firmware /sys/fs /sys/hypervisor /sys/kernel /sys/module /sys/power /var/run
What is the expected correct behavior?
Multistage builds work as expected and the environment is usable after the build finishes.
Relevant logs and/or screenshots
Full pipeline output is here: https://gitlab.com/-/snippets/2254920
Side note: See the error on the last line - is that normal? That's how every build ends but it seems to have no effect. (Probably because the container is shut down.) Still, weird that this isn't caught.
Output of checks
Not applicable.
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Ubuntu 20.04 Current User: git Using RVM: no Ruby Version: 2.7.5p203 Gem Version: 3.1.4 Bundler Version:2.1.4 Rake Version: 13.0.6 Redis Version: 6.0.16 Git Version: 2.33.1. Sidekiq Version:6.2.2 Go Version: unknown GitLab information Version: 14.5.2 Revision: 76ceea558aa Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 12.7 URL: https://git.censored.com HTTP Clone URL: https://git.censored.com/some-group/some-project.git SSH Clone URL: git@git.censored.com:some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 13.22.1 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 13.22.1 ? ... OK (13.22.1) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... 2/1 ... yes 1/2 ... yes 8/3 ... yes 11/4 ... yes 11/6 ... yes 13/7 ... yes 13/8 ... yes 13/9 ... yes 13/10 ... yes 13/11 ... yes 13/12 ... yes 13/13 ... yes 13/14 ... yes 13/15 ... yes 20/16 ... yes 13/17 ... yes 14/18 ... yes 13/19 ... yes 13/20 ... yes 14/21 ... yes 23/22 ... yes 21/23 ... yes 21/24 ... yes 14/25 ... yes 21/26 ... yes 18/27 ... yes 8/28 ... yes 14/31 ... yes 13/32 ... yes 13/33 ... yes 8/34 ... yes Redis version >= 5.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.5) Git version >= 2.33.0 ? ... yes (2.33.1) Git user has default SSH configuration? ... yes Active users: ... 13 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
Possible fixes
We've tried some workarounds. For now, we're copying everything into a temp stage and the COPY --from it into the other stages whatever we need for the build. The executor was also moved to be the last line in the CI step which means we're updating the image version before the build succeeds - pretty terrible. We're still demoing the concept and this instance in general, though, and it doesn't look like it's production ready in general, unfortunately. I'll open more reports about this.
No fixes.