Container Registry improvements
We had a call with @grzesiek to talk about Container Registry improvements: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/7154.
Currently, Registry is a standalone product to which we only connect when needed. We do not track data that is stored there as it was not needed.
This issue describes a plan for greatly improving support for Container Registry with exact steps to be executed and end result giving us a full visibility into registry operation, full control over used storage and allowing us to introduce retention policies with blobs garbage collection, a something that is missing now.
What is that for?
- Make Registry to work when it's usage is CI-oriented, by allowing additional metadata to be stored and removing old, unused data,
- Solve the inability to delete old images,
- Make it possible to easily track what data is stored in registry,
- Make it possible to introduce retention policies for images stored in registry,
- Have Registry under full control.
Proposal
-
We have amazing Community Contribution @andrebsguedes which adds multi-level container images !7154 (closed). This is great, as this let us figure out the first step that we want to execute: make Container Repository for Project to be stored in GitLab Database. This is the first iteration. We will know about all container repositories that are in-use, as ones that can contain data. This is slated for %9.1. For now, we will create data lazilly, to support previous format (API-based connecting to Registry), and a new format
Container Registry Image object
(shortcontainer_image
), -
[https://gitlab.com/gitlab-org/gitlab-ce/issues/30657] The @andrebsguedes had
registry_events
notification from Registry to GitLab. This will go into iteration 2. When we receive a notification that new data was pushed we will update counters stored incontainer_image
to includetags_count
,layers_count
,layers_size
,last_updated_at
. This will let us track from GitLab the amount of data that is stored in different projects. This is partially implemented already as !7154 (closed), the code will be moved to next MR, -
In iteration 3. we will extend
registry_events
to start trackingcontainer_image_tag
a first-class object that store a pushed tag in GitLab DB. We will also create:container_image_tag_blob
andcontainer_image_blob
that will allow us to store all blobs that are used by specified tag having full knowledge of the size and cost of each stored layer. This will let us track all references of images, tags and blobs that are stored on Registry, -
In iteration 4. we will implement real deletes of blobs as we will know which blobs are unreferenced, this will be done the same way as it is done currently for
LFS objects
when a tag is deleted, currently something that needs to be done manually bygarbage-collect
. -
In iteration 5. we will start discussion with Product team on retention policy and automated removal of tags, as we will able to give each of them an expire data, allow them to be promoted to different label if needed and so on. This will work beautifully with iteration 4 as we will remove real data. This is also the moment when we can introduce registry storage limits.
The difficulty
The biggest difficulty is data migration, but since we will be doing that online by introducing next iterations and still for some time supporting an older method of data access at the end of iteration 4. we will have full coverage of what is stored in registry and be able to relay only what is in GitLab.
Data models
container_image:
- id
- name
- project_id
- last_updated_at
# removed once container_image_tag is introduced
- tags_count
- tags_list
- layers_size
- layers_count
container_image_tag:
- id
- container_image
- project_id
- name
- manifest
container_image_tag_blob:
- id
- container_image_tag_id
- container_blob_id
container_blob:
- id
- blob_sha256
- size
- used_count