Dependency proxy for container registries
Problem to solve
The GitLab Container Registry allows users to to build and deploy docker images for usage in CI/CD pipelines or deployments to Kubernetes. However, each time an image was built dependencies would have to be fetched from external sources and downloaded. This introduced risk, as pipelines are dependent on 3rd party sources for each dependency. It would also slow down build times, as the build process was dependent on each download completing.
We will build a caching dependency proxy for container registries into the Rails codebase, at the group level. The proxy will receive requests and return the upstream image. Each time a new image is downloaded, it will be added to the Dependency Proxy and be cached for fast, reliable future downloads. The more you use the proxy, the faster your future builds will run.
This is the MVC and so there are availability limitations. Please see the below list:
- The feature is only available for Enterprise Edition users that are using Puma. As gitlab.com uses Unicorn and not Puma, it is only available for self-managed GitLab users. See here for more information on how to enable Puma. We will enable this feature for Unicorn (and gitlab.com) via the follow up issue https://gitlab.com/gitlab-org/gitlab-ee/issues/11548.
- The feature is only enabled for public projects, because the caching proxy does not yet support authentication. Authentication will be enabled with https://gitlab.com/gitlab-org/gitlab-ee/issues/11582, at which point the proxy will become available for all projects.
- The dependency proxy is available on an instance-level, but requires per-group configuration. We plan to make this easier to set up across an instance via https://gitlab.com/gitlab-org/gitlab-ee/issues/11638, but the feature is usable in the meantime.
- This feature will be available in GitLab Ultimate only
This MVC will act as a pull-through cache of upstream images. Caching was determined to be the first step since it provides a useful foundation that other features (security, rules, etc.) can build upon.
The reason we have chosen using the Rails codebase is that it offers a quicker time-to-market for the feature than investigating a different solution immediately, such as Go or other options. The codebase will remain simple enough in the MVC state that we can learn from the feature, get customer feedback, and decide later if a different technology approach is more suitable.
No security scans or support for upstream authenticated repositories is supported yet. It is purely intended to act as an caching proxy server for the defined repositories. These features will arrive in later releases, once we build the functionality that is able to retrieve and cache images locally.
In terms of internal security implementation, we have ensured that the service which is generating requests to upstream servers avoids making requests to the local network. This is similar to all of our other functions which can take a user entered URL, to prevent potential SSRF attacks utilizing the GitLab server. Note that there is a flag to disable these protections, if needed.
For the MVC we will not implement authentication, but will also not enable the feature on gitlab.com. Enabling gitlab.com access will come in a follow-up issue.
Group vs. Project vs. Instance Level
The reasons this will be implemented at the group instead of project level are:
- We have too much complexity in project itself. If we can unload some features that are necessary only on a group level, we should do it.
- Putting the feature into the same UI as a container registry is confusing for a user. You should be able to use a registry proxy even if you don't use a registry itself.
- People are likely to create one project per group and re-use it for a whole group as a single proxy endpoint.
- In future we introduce dependency proxy for maven and npm. It makes sense to have all dependency proxy features in one place in UI.
- Billing is on a group level so it can be used on GitLab.com
- For users that meet the availability standards, navigate to Group home
- Overview → Dependency Proxy
- Turn the feature on by toggling the button
- Click ‘save changes’
- Copy dependency proxy URL for use from the terminal
- From the terminal $ docker pull proxy/url/package name version
- GitLab makes a request to docker hub and will download all blobs that are not stored locally
- Docker needs:
- Network connection
- Will look for latest version
- Updates to latest version will replace cached version
- Manifest 1kb text to show dependent blobs isn’t cached
- Base docker images are often made up of multiple images that will be proxied
- From the UI you can now see how many blobs are in the proxy and the total size
Instance administrator workflow
There is an admin flag to control whether the proxy feature is available for a given instance.
This can be displayed in
Admin Area -> CI/CD -> Container Registry for now:
The feature is defaulted to on because the endpoint only has access to other public registries and is not a security concern.
For the future, if the project is authenticated, the upstream proxy will also be authenticated to match. So we shouldn't be exposing a new unauthenticated service, for example if all of a given customers projects are private but GitLab is reachable on the internet.
What does success look like, and how can we measure that?
- We should measure the # of instances with the caching proxy enabled
- The feature will be utilized by at least X% of people using pipelines
- We can measure cache hit ratio as a measure of how much this feature is helping
- Less directly measurable, we expect this feature to drive adoption of Package more generally and even CI/CD for users for whom a proxy was a blocker
- User engagement in the form of issues/feedback will be a critical success measure of adoption, as with any completely new category
Future Priorities (in priority order)
- Authentication: so we can support private projects
- Default to on for groups:
- Improve discovery and navigation of GitLab Package features
- Purge cache functionality
- Add limits to dependency proxy
- Unicorn support: This will allow for rollout to a broader audience
- Add ability to search the proxy