WIP: Native support for kaniko in .gitlab-ci.yml (!22659) · Merge requests · GitLab.org / GitLab FOSS

Tomasz Maczukin requested to merge native-support-for-kaniko-in-gitlab-ci-yml into master Oct 29, 2018

What does this MR do?

Background

Some time ago we've had a discussion with @ayufan about privileged mode for Docker executor in GitLab Runner, and the problems that it creates for shared and highly used environments like - for example - Shared Runners on GitLab.com. So why we've decided used it, and why we consider it problematic in such case?

The main reason why we've decided to configure Shared Runners on GitLab.com with privileged = true, is that this is the only way to run Docker-in-Docker jobs. And for a long time this was the only way to build Docker images on our Shared Runners. When in May 2016 we've introduced Docker Images Registry as part of GitLab a possibility of building Docker images inside of GitLab's CI was just a must-have.

But privileged = true brings a big problem to CI system. It - in very general - removes many of Docker internal security mechanisms, which means that escaping a container is much easier than normally. This is one of the reasons why Shared Runners on GitLab.com are configured to remove the autoscaled machine after each of executed jobs. With privileged = true there is just too much risk, that someone would escape the container and affect the host VM, which next could be used by another user in another project (so this opens a possibility to access projects that normally one doesn't have access to).

Both of these reasons - a need for allowing Docker builds on our CI and at the same time the security problems created by having privileged = true - are generating further problems. We already know that autoscaling configuration for GitLab.com Shared Runners, which uses docker+machine executor, on the scale that we're using is not most efficient and easy to maintain. We'd very like to migrate to Kubernetes executor and take benefits from Kubernetes Cluster Autoscaler. But to make an efficient usage of Kubernetes we can't just stick each job with one VM host. And as already described - privileged = true in such case is just a bad idea.

In the mentioned discussion, one of @ayufan's statements was:

I would generally look at improving our docker-build practices to use kaniko now, instead of docker build which will remove the need for privileged flag.

Since I haven't use Kaniko yet, I wanted to give it a try and experiment a little during a weekend. Looking on how Kaniko works and how to use it in GitLab CI, I've found that we've already started documenting this: https://docs.gitlab.com/ee/ci/docker/using_kaniko.html. So I've created a test project, copied the proposed configuration and pushed the commit. And it just worked! Yay!

But looking on the content in .gitlab-ci.yml file I become less happy. If we really want to encourage our users to switch from - currently well documented and broadly used - DinD-based Docker builds to Kaniko-based Docker builds, we need to make it much easier.

From User's perspective DinD based configuration is very simple. One just needs to define services: [docker:dind] for the job and ensure that the job will be executed on a Docker executor with privileged = true. Then, in the job's script, one can just replicate what he would do locally, so:

docker login ...
docker build ...
docker push...

Let's then look on two different definitions of the same Docker image build:

Using Docker-in-Docker

build docker image:
  services:
  - docker:dind
  script:
  - docker login --username $CI_REGISTRY_USER --password $CI_REGISTRY_PASSWORD $CI_REGISTRY_USERNAME
  - docker build -t $CI_REGISTRY_IMAGE .
  - docker push $CI_REGISTRY_IMAGE

Using Kaniko

build docker image:
  script:
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
  - |
    /kaniko/executor \
      --context $CI_PROJECT_DIR \
      --dockerfile $CI_PROJECT_DIR/Dockerfile \
      --destination $CI_REGISTRY_IMAGE

What problems I see:

A need to learn new way of defining jobs. Well, to be honest this will be a case for any way different than Docker on a Shell executor or mentioned here Docker-in-Docker. But the next point is that...
Kaniko based configuration is just ugly, expecially a need to manually create content of the configuration file, with the need of escaping the " characters for the purpose of proper echoing.
The /kaniko/executor invocation itself is also less user friendly than docker build - it requires an explicit path for context (because by default it's /workspace/), the same goes for dockerfile flag. The --destination is also not the same as --tag in docker build since it builds and pushes the image by default, and if one want only to build (e.g. to test if Dockerfile definition is proper) then --no-push needs to be added.
One needs to remember, that this job will work only on gcr.io/kaniko-project/executor:debug image, and only when entrypoint will be cleaned.

And this is, what brought me to experimenting with the content of this MR.

So what this MR really does?

This is snapshot of my experimenting to prepare a PoC of, let's name it, native support for Kaniko in GitLab CI. I've did some experiments during this weekend and just wanted to save this work somewhere in case if we would be interested in moving it forward.

With this MR the same exact definition as described above for Docker-in-Docker and explicit Kaniko usage, would be:

build docker image:
  kaniko:

and nothing else! It would instruct GitLab CI to prepare a Kaniko build definition, using $CI_REGISTRY, $CI_REGISTRY_USER and $CI_REGISTRY_PASSWORD as authorization credentials, root directory as context, Dockerfile in the root directory as dockerfile and $CI_REGISTRY_IMAGE as the only tag to be pushed. All by default.

For many projects this would be the only thing that needs to be defined for building and publishing the image.

But since not every Docker project is as simple as building the Dockerfile from Project's root directory and tagging it as $CI_REGISTRY_IMAGE, the kaniko: configuration allows to do a lot more:

build docker image:
  kaniko:
    credentials:
    - registry: registry1.example.com
      username: user1
      password: $REGISTRY1_EXAMPLE_COM_PASSWORD
    - registry: registry2.example.com
      username: user2
      password: $REGISTRY2_EXAMPLE_COM_PASSWORD
    images:
    - context: dockerfiles/nginx
      dockerfile: dockerfiles/nginx
      args:
        NGINX_VERSION: 1.2.3
      tags:
      - $CI_REGISTRY_IMAGE/ngix
      - registry1.example.com/my/nginx
      - registry2.example.com/some/nginx
    - context: dockerfiles/acme
      dockerfile: dockerfiles/acme
      tags:
      - $CI_REGISTRY_IMAGE/acme
      - registry1.example.com/my/acme
      - registry2.example.com/some/acme

In this case, the job will automatically:

Prepare credentials entries for /kaniko/.docker/config.json basing on credentials entry. Please notice that there is no definition for $CI_REGISTRY that would use $CI_REGISTRY_USER and $CI_REGISTRY_PASSWORD. It's that because I've assumed that configuration of internal registry should be just added always.
Build and push image from dockerfiles/nginx/Dockefile as $CI_REGISTRY_IMAGE/ngix, registry1.example.com/my/nginx and registry2.example.com/my/nginx (that's why credentials for registry1 and registry2 needed to be configured).
Build and push image from dockerfiles/acme/Dockefile as $CI_REGISTRY_IMAGE/acme, registry1.example.com/my/acme and registry2.example.com/my/acme.

What's most important, is that this job will work for Docker, Docker Machine and Kubernetes executor, and there is no change in Runner needed. It's that because internally, the definition of kaniko: is changed into script lines, that are appended to script (or if script was not defined - used as it). So from Runner's perspective there is nothing new - just another script that was sent from GitLab. But from user's perspective there is no need to created this script explicitly, but to use a structured configuration instead.

Appending the Kaniko build definition as part of script has also another interesting effect. It means, that Kaniko configuration may use any variable created as part of the script, e.g.:

build docker image:
  kaniko:
    images:
     args:
       NGINX_VERSION: $NGINX_VERSION
  script:
  - export NGINX_VERSION=$(curl -s http://example.com/detect-nginx-version)

In this case the final script, that will be sent to Runner, will look like:

export NGINX_VERSION=$(curl -s http://example.com/detect-nginx-version)
echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
/kaniko/executor --context $CI_PROJECT_DIR/. $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE  --build-arg NGINX_VERSION=$NGINX_VERSION

So first, curl -s http://example.com/detect-nginx-version, which detects expected Nginx version, is saved and exported as NGINX_VERSION variable, and next, this variable is used for specifying the NGINX_VERSION build argument for Docker image build. Since all lines are executed in the same context, the variable is available for the /kaniko/executor command call.

Open questions and what needs to be solved before merging this

Currently $CI_REGISTRY authentication entries will be added even if internal registry is disabled. This may end with "":{username:"",password""} entry in config.json auth hash. Is it a problem if $CI_REGISTRY is not used as push target - probably not. But it may be better to not set this configuration, as well as default $CI_REGISTRY_IMAGE for image:tags: when the internal registry is not enabled.
With current implementation, if dockerfile: is not specified, it will default to $CI_PROJECT_DIR/Dockerfile, while it should probably default to [value computed from context]/Dockerfile.
What other features of Kaniko we should support? I think that at last --no-push to provide a way of building an image without pushing (e.g. on a feature branch one want only to tests if Dockerfile is built, but on master it want it also to be pushed to the registry). Probably also --insecure or rather --skip-tls-verify to support users who are using self-signed certificates on internal infrastructure.
Is kaniko: the best name for this config entry? I've started this MR as native support for Kaniko, but maybe we should think on some generic naming. With such, in the future we could switch to any other, better tool than Kaniko, or maybe even detect and support other container systems than Docker, having the base build configuration unchanged. The same goes for dockerfile:, since it's very Docker-centric. Maybe image_definition: would be a better name?
Proper implementation. The current one in few places looks a little hacky. If we decide to go this way, some parts of the implementation will probably need a polishing, to make it clean and not introduce ~"technical debt".
TESTS! I didn't add any tests for new config entries, as well as I haven't update Entry::Job tests. I've only added a little change in YamlProcessor specs, and this was done only to have an easy way of experimenting with what my changes are introducing to the finally parsed job configuration. If we decide to move this forward, proper tests needs to be added!

What are the relevant issue numbers?

After preparing the base implementation I thought, that I'm probably not the first one person that was thinking about implementing a native support for containers building in GitLab CI. After a quick search I found #48913 (moved) which as the Long term solution proposes something similar to what I'm proposing here. It's already set as direction and ~"Product Vision 2019" and scheduled for %"2019 Q2". In fact, the Building images with Kaniko and GitLab CI/CD documentation I've mentioned above was linked as a first step for this issue.

I think that this MR could be a good candidate for a base for resolving this issue.

Does this MR meet the acceptance criteria?

Changelog entry added, if necessary
Documentation created/updated
Tests added for this feature/bug
Conforms to the code review guidelines
Conforms to the merge request performance guidelines
Conforms to the style guides
Conforms to the database guides
Link to e2e tests MR added if this MR has Requires e2e tests label. See the Test Planning Process.

Edited Oct 29, 2018 by Tomasz Maczukin

WIP: Native support for kaniko in .gitlab-ci.yml