Allow the Dependency Proxy to cache environments
Problem to solve
Allow the Dependency Proxy to cache "environments" (which would include packages) and intelligently invalidate these to speed up pipeline running times.
Author's note: to set some context, this idea came from a discussion on how we could cache packages from the package registry as part of CI/CD builds. This issue is an alternative idea to tackle the problem. I also don't know how feasible this is and no investigation has been done yet.
- Sasha (Software Developer)
- Devon (DevOps Engineer)
- Sidney (Systems Administrator)
- Simone (Software Engineer in Test)
As an alternative to caching packages via a series of proxying, what if the Dependency Proxy was capable of caching an environment that is re-used in CI/CD until it is intelligently invalidated and automatically rebuilt without any action from the end user.
For example, lets say that inside your
gitlab-ci.yml, you have a stage that sets up your environment:
stages: - setup - build - deploy setup: stage: setup script: - npm install - bundle install - ...more commands to setup your environment
We could add a flag inside the
setup block to make this visible to the Dependency Proxy (e.g.
use-dependency-proxy: true). Once this stage is complete, the Dependency Proxy would:
- Create an "environment" container image from a snapshot of the current image state (i.e. after all the packages are downloaded, etc)
- Store this inside the GitLab registry and associate it to the project, marking it as an "environment" image
- Run the rest of the stages using this new "environment" image as the base image until the pipeline is complete
Once the pipeline is complete, we'll then have a new "environment" image that should include all the packages / dependencies for the target project. Any future pipelines run from this point would skip the setup stage and use the "environment" image for the remaining stages - this is the Dependency Proxy taking action.
This would bring a bunch of benefits to the user:
- Pipeline running times are decreased because we can use the existing Dependency Proxy features so that we don't have to download a base image every time
- Pipeline running times are decreased because a stage is skipped for subsequent builds that do not invalidate the environment
- Pipeline running times are further decreased by no longer needing to fetch / install dependencies or perform set up
- The user can easily manage / alter the setup stage to cache as much or as little as they want
- A reusable image is produced that the user can use in places outside of GitLab
- The "environment" image can be downloaded and inspected locally like any other image
- Possibly could be used as part of a process to archive or keep old builds / artifacts
Note: Similar functionality to this is semi possible using the Dependency Proxy today, but it relies on the user managing their "environment" image manually, and therefore isn't something that a real user is likely to do.
If we can automate the creating, usage and invalidating of these "environment" images, then we offer our users a powerful way of enhancing their pipelines via the Dependency Proxy.
1. How does the Dependency Proxy invalidate an environment? This is a big problem to solve. One approach is that the Dependency Proxy could know a list of files to watch and if they are changed as part of the commit that triggers the pipeline, we run the
setup job again (which would generate a new "environment" image). Perhaps this list of files could be supplied via the Dependency Proxy page or maybe as part of the
gitlab-ci.yml file itself (and would probably include the editing of this file too).
Alternatively / additionally, we could offer ways to invalidate images with keywords used in commit messages, manual buttons, expiry dates, etc.
2. How will the "environment" image get the latest project code? This is another tricky one which I don't know the answer to. Simple answer:
git pull at the beginning of every stage that relies on the image. An alternative might be some magic using the current CI/CD cache methods. This is going to need investigation.
3. Where would these images be stored? The same place as the current Dependency Proxy images. Ideally in a location that offers a quicker or no download.
4. What would the UI display? The current Dependency Proxy UI could be updated to include a way of managing these "environment" images. For example, they could be manually invalidated, deleted from storage or other such expiry rules configured. We could also offer a way to download these images so users can inspect their environments at any time (it would be really cool if we could build some kind of frontend UI for this too). I think we should also capture the Pipeline Job output for the setup task that built the "environment" image and allow this log to be viewed from the frontend.
This has the potential to be quite disruptive as the Dependency Proxy would be changing how CI/CD Pipelines are run - for example, we would conditionally run stages based on if the environment needs rebuilding. A possible way to mitigate this might be for the Dependency Proxy to "inject" a modified
gitlab.ymlwhich is then run. The "injected" yml file would remove the setup job and replace the base image with the "environment" image.
We may want to wait until we can store metadata alongside container registry images so these "environment" images can be easily distinguished. For now, we may be able to create a dedicated container repository for the project or use a naming convention.
A feature like this is likely to need cross stage validation and consideration. :)
Another risk is that the Dependency Proxy isn't really "proxying" anything here - so maybe this isn't the right place for a feature like this.
Any other advantages?
Here's a few:
- This is a general approach to speeding up pipelines and therefore is compatible with all package managers (i.e. it won't require specific package manager considerations or code)
- More than just packages can be proxied / cached
- "Environment" images could be made available to more than just GitLab Pipelines and functionality could be built on top of this (for example, an API, alerts / notifications when new environments are built, sync with external storage or backup systems, etc)
- Could be the foundation for easier environment creation for users in the future
- Dependency Proxy effectiveness could be easily measured and displayed to the user - all you would need to do is compare the running time of the pipeline that created the environment vs the running time of one that used the "environment" image (assuming the pipeline isn't edited in-between)
- "Environment" images should be able to take advantage of CI variables in the standard way. Or perhaps we could offer a feature to "lock" variables to be defined as they were during the initial setup stage
- This could work alongside package manager proxying - it doesn't need to replace it
What does success look like, and how can we measure that?
Subsequent pipeline running times are reduced for the user.