Skip to content

Kaniko multistage builds failing due to missing files

Summary

Gitlab version: 14.5.2 Kaniko version: 1.7.0

In order to build Docker containers for use in AWS, we're using the design document specifying how to run autoscaling CI on Fargate: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws_fargate/index.html

Our Dockerfile for the CI builder is derived from one of the official examples, adding only some scripts, more tools etc.: https://gitlab.com/aws-fargate-driver-demo/docker-kaniko-gitlab-ci-fargate/-/blob/master/alpine/Dockerfile

When building multistage Dockerfiles, Kaniko changes the files present inside the container during the first step. All subsequent steps have no access to the repository and all commands except the Kaniko-provided ones are missing after the executor finishes.

This is likely not an issue with Kaniko itself - it makes sense as far as I understand its mechanism but there are no issue reports to be found on this. For something that fundamental, that is not realistic.

Steps to reproduce

{
    "auths": {
        "${CI_REGISTRY}": {"auth": "${CI_LOGIN}"}
    },
    "credHelpers": {
        "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com": "ecr-login"
    }
}
  • Build this image and make it available as the CI coordinator as per the guide. You can likely just strip out all the ECR-related stuff but I'm leaving it in for completeness.
  • Create a new repo with this .gitlab-ci.yml in the Gitlab instance set up to use Fargate for its builds:
Build:
  stage: build
  before_script:
    - source /context
    - export CI_LOGIN="$(printf "%s:%s"
                                "${CI_REGISTRY_USER}" "${CI_REGISTRY_PASSWORD}"
                         | base64 | tr -d '\n')"
    - ls -la
    - envsubst < /root/.docker/config.json.tpl > /root/.docker/config.json
  script:
    - executor --context "dir://${CI_PROJECT_DIR}"
               --dockerfile Dockerfile
               --destination "${CI_REGISTRY_IMAGE}:latest" || true
    - ls -la || true
    - echo **
    - echo ${SHELL}
    - echo /**/*
  • Create a Dockerfile in the repo root:
FROM scratch AS c1

COPY ./f1 /f1

FROM scratch as c2

COPY --from=c1 /f1 /f1
COPY ./f2 /f2
  • touch f1 f2
  • Commit and let the build run.

Example Project

Not applicable.

What is the current bug behavior?

This is the important part of the Kaniko output:

INFO[0000] Unpacking rootfs as cmd COPY ./f1 /f1 requires it. 
INFO[0000] COPY ./f1 /f1                                
INFO[0000] Taking snapshot of files...                  
INFO[0000] Saving file f1 for later use                 
INFO[0000] Deleting filesystem...                       
INFO[0000] No base image, nothing to extract            
INFO[0000] Executing 0 build triggers                   
INFO[0000] Unpacking rootfs as cmd COPY --from=c1 /f1 /f1 requires it. 
INFO[0000] COPY --from=c1 /f1 /f1                       
INFO[0000] Taking snapshot of files...                  
error building image: error building stage: failed to get files used from context: failed to get fileinfo for /opt/gitlab-runner/builds/infrastructure/bug-report-multistage/f2: lstat /opt/gitlab-runner/builds/infrastructure/bug-report-multistage/f2: no such file or directory

The first ls succeeds, but the second execution fails. Huge parts of the container are missing, but the pipeline is still running in bash. ** is not expanded, so the repo is gone.

$ ls -la || true
bash: line 130: /bin/ls: No such file or directory
$ echo **
**
$ echo ${SHELL}
/bin/bash
$ echo /**/*
/dev/core /dev/fd /dev/full /dev/mqueue /dev/null /dev/ptmx /dev/pts /dev/random /dev/shm /dev/stderr /dev/stdin /dev/stdout /dev/tty /dev/urandom /dev/zero /etc/hostname /etc/hosts /etc/mtab /etc/resolv.conf /kaniko/0 /kaniko/457978159 /kaniko/903179970 /kaniko/Dockerfile /kaniko/docker-credential-acr /kaniko/docker-credential-ecr-login /kaniko/docker-credential-gcr /kaniko/executor /kaniko/ssl /proc/1 /proc/105 /proc/107 /proc/109 /proc/39 /proc/9 /proc/acpi /proc/buddyinfo /proc/bus /proc/cgroups /proc/cmdline /proc/consoles /proc/cpuinfo /proc/crypto /proc/devices /proc/diskstats /proc/dma /proc/driver /proc/execdomains /proc/filesystems /proc/fs /proc/interrupts /proc/iomem /proc/ioports /proc/irq /proc/kallsyms /proc/kcore /proc/key-users /proc/keys /proc/kmsg /proc/kpagecgroup /proc/kpagecount /proc/kpageflags /proc/latency_stats /proc/loadavg /proc/locks /proc/mdstat /proc/meminfo /proc/misc /proc/modules /proc/mounts /proc/mtrr /proc/net /proc/pagetypeinfo /proc/partitions /proc/sched_debug /proc/schedstat /proc/scsi /proc/self /proc/slabinfo /proc/softirqs /proc/stat /proc/swaps /proc/sys /proc/sysrq-trigger /proc/sysvipc /proc/thread-self /proc/timer_list /proc/tty /proc/uptime /proc/version /proc/vmallocinfo /proc/vmstat /proc/xen /proc/zoneinfo /sys/block /sys/bus /sys/class /sys/dev /sys/devices /sys/firmware /sys/fs /sys/hypervisor /sys/kernel /sys/module /sys/power /var/run

What is the expected correct behavior?

Multistage builds work as expected and the environment is usable after the build finishes.

Relevant logs and/or screenshots

Full pipeline output is here: https://gitlab.com/-/snippets/2254920

Side note: See the error on the last line - is that normal? That's how every build ends but it seems to have no effect. (Probably because the container is shut down.) Still, weird that this isn't caught.

Output of checks

Not applicable.

Results of GitLab environment info

Expand for output related to GitLab environment info

System information
System:         Ubuntu 20.04
Current User:   git
Using RVM:      no
Ruby Version:   2.7.5p203
Gem Version:    3.1.4
Bundler Version:2.1.4
Rake Version:   13.0.6
Redis Version:  6.0.16
Git Version:    2.33.1.
Sidekiq Version:6.2.2
Go Version:     unknown

GitLab information
Version:        14.5.2
Revision:       76ceea558aa
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     12.7
URL:            https://git.censored.com
HTTP Clone URL: https://git.censored.com/some-group/some-project.git
SSH Clone URL:  git@git.censored.com:some-group/some-project.git
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers:

GitLab Shell
Version:        13.22.1
Repository storage paths:
- default:      /var/opt/gitlab/git-data/repositories
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell
Git:            /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check

Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 13.22.1 ? ... OK (13.22.1) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... 2/1 ... yes 1/2 ... yes 8/3 ... yes 11/4 ... yes 11/6 ... yes 13/7 ... yes 13/8 ... yes 13/9 ... yes 13/10 ... yes 13/11 ... yes 13/12 ... yes 13/13 ... yes 13/14 ... yes 13/15 ... yes 20/16 ... yes 13/17 ... yes 14/18 ... yes 13/19 ... yes 13/20 ... yes 14/21 ... yes 23/22 ... yes 21/23 ... yes 21/24 ... yes 14/25 ... yes 21/26 ... yes 18/27 ... yes 8/28 ... yes 14/31 ... yes 13/32 ... yes 13/33 ... yes 8/34 ... yes Redis version >= 5.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.5) Git version >= 2.33.0 ? ... yes (2.33.1) Git user has default SSH configuration? ... yes Active users: ... 13 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes

Checking GitLab App ... Finished

Checking GitLab subtasks ... Finished

Possible fixes

We've tried some workarounds. For now, we're copying everything into a temp stage and the COPY --from it into the other stages whatever we need for the build. The executor was also moved to be the last line in the CI step which means we're updating the image version before the build succeeds - pretty terrible. We're still demoing the concept and this instance in general, though, and it doesn't look like it's production ready in general, unfortunately. I'll open more reports about this.

No fixes.