Skip to content

Investigate: Container registry "Rename Lease"

Problem 🚒

Issue Description

A "rename-lease" (which is a type of "repository-lease") is a read AND write lock on a container-repository namespace (i.e a container repository base repository and sub-repositories, (if any) as specified by a path) or a potential namespace (i.e a base path that is yet to be created ). The "rename-lease" makes it impossible to read/write to (or create) any repositories for the path the lease was granted for.

The "rename lease" is expected to be used to prevent read/writes to and from all origin and target repositories (that the lease was granted for) for a short period. It is a means for avoiding potential race conditions and preventing data consistency/integrity issues when considering renaming a GitLab project. See section "(2)" of gitlab#381392 (closed) for more context on the use-for/why "rename-lease"

For this issue we will be digging into (in more detail):

  • What a rename lease entails programmatically
  • How long is reasonable for a "rename lease"
  • Where rename-leases would be held
  • If it may be worth generalizing the idea of a "rename-lease" to just a "repository-lease" for other operations (e.g moving a repository)
  • Any other ideas regarding rename leases

Assessment ✍🏾

Context

A "rename-lease" was proposed in gitlab#381392 (closed) as a mechanism for preventing data inconsistency problems that could arise when a repository rename is being performed.

The general idea behind the "rename-lease" is a lock of sorts on the resource space (i.e the repository) being renamed-from and renamed-to. In the proceeding sections below we dive into further details on how the "rename lease" can be realized.

Repository lease 🛂

Although in this issue we are considering only "rename" operations, the idea of a rename-lease proposed in gitlab#381392 (closed) could be further generalized to a "repository-lease". Generalizing this concept would allow other/future repository locking operations (e.g moving a repository to a new namespace), to possibly take advantage of the leasing mechanism (if needed).

From this section hence forth we will refer to the "rename-lease" as the generalized "repository-lease".

Where should the Repository lease be kept 🔍

To ensure that each instance of the registry (i.e for cases where the registry is run as a replica-set) utilize the repository lease adequately, the lease must be stored in a central location that is accessible by all registry instances of a GitLab installation; this limits our choices for where the lease can be hosted to: postgres, redis, or the storage-backend.

Redis: The most straight forward approach to realizing repository leases are to use redis. With redis, we can create a lease and assign a TTL for when the lease will expire, without any extra overhead on the registry side to manage/purge expired leases.

However, as tempting as it sounds, this may not appear to be a great idea purely because redis is currently not tightly coupled to the registry -- the registry is fully functional (less some optimization) without redis -- and by making the decision to use redis we implicitly necessitate that redis be available for the registry to use this feature. This would block this feature's impact on self-managed installs re: #884 (comment 1257962216)

Storage backend: Generally speaking, the storage backend should not be considered as anything more than a blob storage and as such, i think it will serve us best to stir away from using it as the home for the repository lease

Postgres: The last option worth considering would be to keep the repository leases in postgres, this should be feasible given that:

  • This feature will only be available for the registry that utilizes the metadata database

After gaining some insight from the team on the current-future tradeoffs of relying solely on postgres for the first implementation, as opposed to redis for storing the repository-lease (see discussion here), There is a common agreement that utilizing redis would be the MVC, while keeping in mind that we will need to adapt to future requirements if needed.

Below is the summary of the primary reasons for opting to use redis:

  1. This feature is first gated behind the metadata DB (not Redis), which is not yet available on self-managed, and a lot will change until that is no longer the case. So I think we shouldn't try to guess the future AND incur significantly more work/risk when implementing a solution for the present.

  2. We would benefit from having the first implementation being in Redis, since it fits the use case the most cleanly, as you mention above. We'll no doubt encounter some challenges rolling this out, so we might as well make our work lighter where we can. Then later, if the need arises, we can use the experience we gained from the initial Redis-only approach to design a better interface, but also to implement the less well suited backends.

Repository lease implementation with redis 🐁

This implementation involves; storing the repository leas claims in a redis hash

Repository Lease Data structure

The key name should be set using the full path of the target repository. A plausible key is of the form: registry:api:{repository-lease:<namespace path>:<path hash>}, following the documented best practices. The naming convention for the key above will allow for CROSSSLOT compatibility, name clash avoidance (in case of Redis instances shared by multiple applications) and discoverability.

The value should be an object representing a lease having the following (minimum) fields:

Field Type Description example
granted_to string the name of the already existing repository that requested the lease /my-group/my-sub-group/old-name
lease_reason string reason the lease was granted to the repository identified in granted_to rename/move ...

Example

When an existing repository (/my-group/my-sub-group/old-name) requests and is granted a repository lease for a rename operation to the name new-name, two keys will be created in redis with TTL, like so:

key value
registry:api:{repository-lease:/my-group/my-sub-group/old-name:hash(/my-group/my-sub-group/old-name)} {"granted_to":"/my-group/my-sub-group/old-name", "lease_reason":"rename"}
registry:api:{repository-lease:/my-group/my-sub-group/new-name:hash(/my-group/my-sub-group/new-name)} {"granted_to":"/my-group/my-sub-group/old-name", "lease_reason":"rename"}

Repository Lease TTL

My suggestion is we start with a generous but less-intrusive expiration time of 60 seconds per repository lease and then dail up or down (in future iterations) as we get more insight on how long a rename operations would really take.

To fine tune the expiration time of a repository lease in future iterations, we would need to consider the upper-bound of the sum of:

  • the average time spent carrying out the network calls between the registry and rails
  • the average time spent by rails before utilizing a granted repository lease
  • the average estimated time the registry will require to perform the actual rename operation on the database

Metrics 🌲 👂

We will need to collect kibana-viewable logs on:

  • When a request to the registry was rejected due to a lease being in place
  • When a leases was successfully issued

We will need to collect sentry-viewable errors on:

  • When a lease was failed to be issued (for reasons other than a conflicting lease already existing)

Limitations 🛩

  • For scalability and performance reasons, this feature will start by being limited to projects with no more than 1000 container repositories. For GitLab.com, this covers 99.98% of all projects (source). We can then increase this later based on metrics and pending a decision in https://gitlab.com/gitlab-org/gitlab/-/issues/357014 (internal). Attempting to update more than 1000 repositories (base and sub-repositories) should yield a 422 Unprocessable Entity response.

  • This feature (at them moment of conception) would only be available on registry instances that use the metadata database AND are backed by redis.

Overview of repository lease Interaction flows

In this section, we will use the sample repository my-group/my-sub-group/old-name to denote an existing repository and new-name to represent the new name we will like to rename the repository to.

This is a simplified view to depict a flow, a few trivial details are purposefully left out to reduce complexity and verbosity

Worth particular notice is; we've ignored in this description, cases where an internal error occurs from within the registry or in the services the registry interacts with (i.e redis or postgres). In those particular cases it can be assumed that a 500 Internal Error status code will be returned to the caller. Similarly, when a request is wrongly formatted we can assume a 400 Bad Request will be returned to the caller.

Rename Pre-validation

sequenceDiagram
  autonumber
  participant G as GitLab Rails
  participant R as GitLab Container Registry
  participant RR as Container Registry Redis
  participant P as Container Registry Postgres
  G->>R: PATCH /gitlab/v1/repositories/my-group/my-sub-group/old-name/?dry-run=true Body:{name:"new-name"}
  R->>RR: Check if a repository lease exists
  alt There is conflicting repository lease <br>(i.e another existing repository holds a lease for the "new-name")
  R->>G: 409 Conflict
  else There are no conflicting repository leases
    R->>P: Check existence of repository with "my-group/my-sub-group/new-name" <br>is in database's repository table
  alt "my-group/my-sub-group/new-name" already exists
  R->>G: 400 Bad Request
  else
  R->>P: Check if old path ("my-group/my-sub-group/old-name") contains more than 1000 sub repos
  alt path contains more than 1000 sub repos
  R->>G: 422 Unprocessable Entity
  else path contains less than 1000 sub repos
  R->>RR: Retrieve the existing repository lease (if any)
  alt There is no existing repository lease for <br>"my-group/my-sub-group/old-name" to "my-group/my-sub-group/new-name"
  R->>RR: Create the necessary repository lease
  else There is an existing repository lease for the same repositories referenced in the request
  R->>RR: Extend the existing repository lease TTL (to allow time to successfully complete a rename)
  end
  R->>G: 200 OK Body:{"ttl":"Xs"}
  end
  end
  end

Performing a rename

sequenceDiagram
  autonumber
  participant G as GitLab Rails
  participant R as GitLab Container Registry
  participant RR as Container Registry Redis
  participant P as Container Registry Postgres
  G->>R: PATCH /gitlab/v1/repositories/my-group/my-sub-group/old-name/?dry-run=false Body:{name:"new-name"}
  R->>RR: Check if a repository lease exists
  alt There is conflicting repository lease <br>(i.e another existing repository holds a lease for the "new-name")
  R->>G: 409 Conflict
  else There are no conflicting repository leases
    R->>P: Check existence of repository with "my-group/my-sub-group/new-name" <br>is in database's repository table
  alt "my-group/my-sub-group/new-name" already exists
  R->>G: 400 Bad Request
  else
  R->>P: Check if old path ("my-group/my-sub-group/old-name") contains more than 1000 sub repos
  alt path contains more than 1000 sub repos
  R->>G: 422 Unprocessable Entity
  else path contains less than 1000 sub repos
  R->>RR: Retrieve the existing repository lease (if any)
  alt There is no existing repository lease for <br>"my-group/my-sub-group/old-name" to "my-group/my-sub-group/new-name"
  R->>RR: Create the necessary repository lease
  else There is an existing repository lease for the same repositories referenced in the request
  R->>RR: Extend the existing repository lease TTL (to allow time to successfully complete the rename)
  end
  R->>P: Perform rename operation on database repository table and expire lease
  R->>G: 204 No Content
  end
  end
  end

Blocking repository access due to ongoing rename

sequenceDiagram
  autonumber
  participant G as GitLab Rails
  participant R as GitLab Container Registry
  participant P as Container Registry Redis
  G->>R: GET/POST/PATCH/PUT /gitlab/v1/repositories/my-group/my-sub-group/old-name/... 
  R->>P: Check lock
  R->>G: 409 Conflict
sequenceDiagram
  autonumber
  participant C as Docker Client
  participant R as GitLab Container Registry
  participant P as Container Registry Redis
  C->>R: GET/POST/PATCH/PUT/DELETE /v2/my-group/my-sub-group/old-name/...
  R->>P: Check lock
  R->>C: 409 Conflict
Edited by SAhmed