Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Topics Snippets
  • Register
  • Sign in
  • GitLab GitLab
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 52,297
    • Issues 52,297
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1,530
    • Merge requests 1,530
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #207147
Closed
Open
Issue created Feb 18, 2020 by Daniel Croft@dcroftDeveloper

Define container registry database schema to help drive online garbage collection

Problem to solve

The container registry utilizes the file system for storing both manifests as well as layers. This architecture means that it is not possible to run garbage collection (removing untagged layers) while the container registry is online. We're planning on implementing online garbage collection to allow GitLab.com and other customers to run garbage collection while their registry is online.

In order to implement online garbage collection, we will need a different method for storing metadata. A necessary step for achieving this is defining the metadata fields required by the container registry and then propose a schema for a metadata database.

Proposal

This schema was designed based on the Object Specification described in #207147 (comment 294543918) and the discussion around the possible options in #207147 (comment 297386977). Please refer to these comments for additional context.

Notes

  • There is an ongoing discussion around which database type (SQL vs NoSQL) and brand (e.g. PostgreSQL vs MongoDB/RethinkDB) should be used. This schema should be adapted once that discussion is closed. For now we assume it'll be SQL and PostgreSQL.

  • This schema is subject to changes during the development phase. This issue will be updated if/as those changes happen.

ER Model

registry registry.dbm (pgModeler)

registry_ddl.sql

Entities

repositories

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
name text N This is the name of the repository, e.g. gitlab-container-registry.
path text N UNIQUE This is the full path of a repository, e.g. gitlab-org/build/cng/gitlab-container-registry.
parent_id integer Y REFERENCES repositories(id) If a repository is a root repository (e.g. gitlab-org), this is set to null. Otherwise it contains the id of the parent repository. As an example, for the build repository, parent_id would be the id of the gitlab-org repository.
created_at timestamp N Creation timestamp (the time at which the first manifest was pushed to the repository).
deleted_at timestamp Y Soft deletion timestamp.

With parent_id we can build an Adjacency List to represent the relationship of nested repositories.

If using an SQL database, we can use efficient recursive queries to identify the ancestors and descendants of a given repository at any level, through Common Table Expressions (CTEs) (e.g. PostgreSQL WITH). Some NoSQL document databases also support recursive queries (e.g. MongoDB $graphLookup).

manifests

A repository can have many manifests (1:N).

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
repository_id integer N REFERENCES repositories(id)
UNIQUE (repository_id,digest)
The repository that this manifest belongs to.
schema_version integer N Image manifest schema version.
media_type text N The media type of the image manifest document.
digest text N UNIQUE (repository_id,digest) The digest of the manifest.
configuration_id integer N REFERENCES manifest_configurations(id) Manifest configuration.
payload json N Full manifest payload that matches the advertised digest (same format and order of attributes).
created_at timestamp N Creation timestamp.
marked_at timestamp Y Timestamp of the last time the manifest was marked by the garbage collector.
deleted_at timestamp Y Soft deletion timestamp.

layers

A manifest references a set of layers.

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
media_type text N The media type of the image layer.
digest text N UNIQUE The digest of the layer.
created_at timestamp N Creation timestamp.
marked_at timestamp Y Timestamp of the last time the layer was marked by the garbage collector.
deleted_at timestamp Y Soft deletion timestamp.

manifest_layers

A manifest references a set of layers and the same layer may be referenced by several manifests (N:N).

The order of the layers must be guaranteed, so they must be inserted in this table in the correct order. The id guarantees order when reading.

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
manifest_id integer N REFERENCES manifests(id)
UNIQUE (manifest_id,layer_id)
The manifest represented in this relationship.
layer_id integer N REFERENCES layers(id)
UNIQUE (manifest_id,layer_id)
The layer associated with the manifest.
created_at timestamp N Creation timestamp.
marked_at timestamp Y Timestamp of the last time the manifest layer link was marked by the garbage collector.
deleted_at timestamp Y Soft deletion timestamp.

manifest_lists

A repository can have many manifest lists (1:N).

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
repository_id integer N REFERENCES repositories(id) The repository that this manifest list belongs to.
schema_version integer N Image manifest list schema version.
media_type text Y The media type of the image manifest list document.
payload json N Full manifest list payload that matches the advertised digest (same format and order of attributes).
created_at timestamp N Creation timestamp.
marked_at timestamp Y Timestamp of the last time the manifest list was marked by the garbage collector.
deleted_at timestamp Y Soft deletion timestamp.

manifest_list_items

A manifest list is composed by several manifests (1:N).

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
manifest_list_id integer N REFERENCES manifests_lists(id)
UNIQUE (manifest_list_id,manifest_id)
The manifest list represented in this relationship.
manifest_id integer N REFERENCES manifests(id)
UNIQUE (manifest_list_id,manifest_id)
The manifest that belongs to the manifest list.
created_at timestamp N Creation timestamp.
deleted_at timestamp Y Soft deletion timestamp.

manifest_configurations

A manifest has a configuration (1:1). The same configuration may be associated with multiple manifests (1:N).

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
media_type text N The media type of the image configuration.
digest text N UNIQUE The digest of the image configuration.
payload json N Full image configuration payload that matches the advertised digest (same format and order of attributes).
created_at timestamp N Creation timestamp.
deleted_at timestamp Y Soft deletion timestamp.

tags

A tag references a single manifest at a time (1:1).

Name Type Nullable Constraints Description
id serial N PRIMARY KEY Incremental ID.
name text N UNIQUE (name,manifest_id) The name of the tag.
manifest_id integer N REFERENCES manifests(id)
UNIQUE (name,manifest_id)
The manifest that the tag references.
created_at timestamp N Creation timestamp.
updated_at timestamp Y Last update timestamp. This field is updated whenever a tag is switched to a different manifest.
deleted_at timestamp Y Soft deletion timestamp.
Edited Mar 09, 2020 by João Pereira
Assignee
Assign to
Time tracking