Skip to content

Draft: GDK docker image build improvements

Andrejs Cunskis requested to merge andrey-gdk-improvements into master

What does this MR do and why?

PoC for executable gdk image:

  • multi-stage docker image which allows building different components in parallel
  • add gdk base image and rebuild it on component or dependency changes and final gdk image to create executable image
  • final image is executable artifact tagged with a particular commit sha
  • pipeline setup uses gdk image using native ci services functionality
  • reduced image size:
registry.gitlab.com/gitlab-org/gitlab/gitlab-qa-gdk   andrey-gdk-improvements   9c74ce32a005   2 hours ago   8.18GB
registry.gitlab.com/gitlab-org/gitlab/gitlab-qa-gdk   master                    86f5a5b9ece6   5 days ago    20.4GB

with image split in to base and gdk, image size is larger due to limitations of not being able to properly clear go cache:

registry.gitlab.com/gitlab-org/gitlab/gitlab-qa-gdk        2d3041bd210ad7868083659460e7700506b86013   4f0dd428355c   22 minutes ago   9.8GB

Issues/Improvements

  • Creating/fetching/uploading cache takes a lot of time due to complex dockerfile and many steps/stages, might be worth exploring building with plain buildkit as it allows to use newer version and from my experience it has better caching handling, but we loose ability to build arm architecture (we install latest quemu via docker container), updating docker and buildx might also have some improvements, but in the end, just the network seems to be the bottleneck, takes a while to upload the image. Another option is using the approach of base image built only on master runs and just rerunning gdk install, but this makes the setup not very portable and "correct"
  • gdk is not very well designed to run different stages together and then combining them (for example gitaly is always rebuilt just when running db task, this forces to pass through all the gitaly build deps between images which makes it 700mb larger)
  • Couldn't make workhorse properly work with binding to 0.0.0.0 IP address, nginx fixes the issue but still adds another dependency (though it also adds possibility to set up https)
  • There are around 2gb of gems in the image, by identifying which ones are not runtime dependencies, the size could be reduced. development and test gems are also probably not necessary, but might require running gdk in production mode.
  • Currently spec folder has to be included in the image which is almost 100mb due to some of the rake tasks failing to load if spec folder is not present. There are guards that check if environment is production and skip loading those tasks, so if we can run gdk in production mode, less code needs to be copied
  • GDK is slow to boot, about 3 minutes which will fail integrated ci healthcheck for service (30s timeout)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Andrejs Cunskis

Merge request reports