Define container registry database schema to help drive online garbage collection
Problem to solve
The container registry utilizes the file system for storing both manifests as well as layers. This architecture means that it is not possible to run garbage collection (removing untagged layers) while the container registry is online. We're planning on implementing online garbage collection to allow GitLab.com and other customers to run garbage collection while their registry is online.
In order to implement online garbage collection, we will need a different method for storing metadata. A necessary step for achieving this is defining the metadata fields required by the container registry and then propose a schema for a metadata database.
Proposal
This schema was designed based on the Object Specification described in #207147 (comment 294543918) and the discussion around the possible options in #207147 (comment 297386977). Please refer to these comments for additional context.
Notes
-
There is an ongoing discussion around which database type (SQL vs NoSQL) and brand (e.g. PostgreSQL vs MongoDB/RethinkDB) should be used. This schema should be adapted once that discussion is closed. For now we assume it'll be SQL and PostgreSQL.
-
This schema is subject to changes during the development phase. This issue will be updated if/as those changes happen.
ER Model
Entities
repositories
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
name | text |
N | This is the name of the repository, e.g. gitlab-container-registry . |
|
path | text |
N | UNIQUE |
This is the full path of a repository, e.g. gitlab-org/build/cng/gitlab-container-registry . |
parent_id | integer |
Y | REFERENCES repositories(id) |
If a repository is a root repository (e.g. gitlab-org ), this is set to null . Otherwise it contains the id of the parent repository. As an example, for the build repository, parent_id would be the id of the gitlab-org repository. |
created_at | timestamp |
N | Creation timestamp (the time at which the first manifest was pushed to the repository). | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
With parent_id
we can build an Adjacency List to represent the relationship of nested repositories.
If using an SQL database, we can use efficient recursive queries to identify the ancestors and descendants of a given repository at any level, through Common Table Expressions (CTEs) (e.g. PostgreSQL WITH
). Some NoSQL document databases also support recursive queries (e.g. MongoDB $graphLookup
).
manifests
A repository can have many manifests (1:N
).
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
repository_id | integer |
N |
REFERENCES repositories(id) UNIQUE (repository_id,digest)
|
The repository that this manifest belongs to. |
schema_version | integer |
N | Image manifest schema version. | |
media_type | text |
N | The media type of the image manifest document. | |
digest | text |
N | UNIQUE (repository_id,digest) |
The digest of the manifest. |
configuration_id | integer |
N | REFERENCES manifest_configurations(id) |
Manifest configuration. |
payload | json |
N | Full manifest payload that matches the advertised digest (same format and order of attributes). | |
created_at | timestamp |
N | Creation timestamp. | |
marked_at | timestamp |
Y | Timestamp of the last time the manifest was marked by the garbage collector. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
layers
A manifest references a set of layers.
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
media_type | text |
N | The media type of the image layer. | |
digest | text |
N | UNIQUE |
The digest of the layer. |
created_at | timestamp |
N | Creation timestamp. | |
marked_at | timestamp |
Y | Timestamp of the last time the layer was marked by the garbage collector. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
manifest_layers
A manifest references a set of layers and the same layer may be referenced by several manifests (N:N
).
The order of the layers must be guaranteed, so they must be inserted in this table in the correct order. The id
guarantees order when reading.
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
manifest_id | integer |
N |
REFERENCES manifests(id) UNIQUE (manifest_id,layer_id)
|
The manifest represented in this relationship. |
layer_id | integer |
N |
REFERENCES layers(id) UNIQUE (manifest_id,layer_id)
|
The layer associated with the manifest. |
created_at | timestamp |
N | Creation timestamp. | |
marked_at | timestamp |
Y | Timestamp of the last time the manifest layer link was marked by the garbage collector. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
manifest_lists
A repository can have many manifest lists (1:N
).
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
repository_id | integer |
N | REFERENCES repositories(id) |
The repository that this manifest list belongs to. |
schema_version | integer |
N | Image manifest list schema version. | |
media_type | text |
Y | The media type of the image manifest list document. | |
payload | json |
N | Full manifest list payload that matches the advertised digest (same format and order of attributes). | |
created_at | timestamp |
N | Creation timestamp. | |
marked_at | timestamp |
Y | Timestamp of the last time the manifest list was marked by the garbage collector. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
manifest_list_items
A manifest list is composed by several manifests (1:N
).
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
manifest_list_id | integer |
N |
REFERENCES manifests_lists(id) UNIQUE (manifest_list_id,manifest_id)
|
The manifest list represented in this relationship. |
manifest_id | integer |
N |
REFERENCES manifests(id) UNIQUE (manifest_list_id,manifest_id)
|
The manifest that belongs to the manifest list. |
created_at | timestamp |
N | Creation timestamp. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
manifest_configurations
A manifest has a configuration (1:1
). The same configuration may be associated with multiple manifests (1:N
).
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
media_type | text |
N | The media type of the image configuration. | |
digest | text |
N | UNIQUE |
The digest of the image configuration. |
payload | json |
N | Full image configuration payload that matches the advertised digest (same format and order of attributes). | |
created_at | timestamp |
N | Creation timestamp. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |
tags
A tag references a single manifest at a time (1:1
).
Name | Type | Nullable | Constraints | Description |
---|---|---|---|---|
id | serial |
N | PRIMARY KEY |
Incremental ID. |
name | text |
N | UNIQUE (name,manifest_id) |
The name of the tag. |
manifest_id | integer |
N |
REFERENCES manifests(id) UNIQUE (name,manifest_id)
|
The manifest that the tag references. |
created_at | timestamp |
N | Creation timestamp. | |
updated_at | timestamp |
Y | Last update timestamp. This field is updated whenever a tag is switched to a different manifest. | |
deleted_at | timestamp |
Y | Soft deletion timestamp. |