Skip to content

Assign repositories unique IDs in Praefect

Sami Hiltunen requested to merge smh-cluster-repository-id into master

Praefect's approach of identifying repositories solely by their virtual storage and relative path is causing issues. Removing and recreating a repository with the same virtual storage and relative path maps the new repository to the same disk path and database state. This can cause failures recreating the repository due to already existing state and is currently impacting backup restoration.

The virtual storage and relative path are client provided. They are used as the primary key in the database as they are the only method of identifying a repository currently. If the client changes the relative path, Praefect has to update the keys in the database and move the replicas physically on the disks of the Gitaly nodes. This is error prone as it may be that the operation succeeds only partially causing Praefect to lose track of the repository. This is impacting the interoperability of Geo and Gitaly Cluster as Geo performs renames frequently.

The delete/recreation problem can be avoided by ensuring the newly created repository does not map to the previously deleted repository's state even if it uses the same virtual storage and relative path. The rename problem can be avoided by storing the replicas in static paths determined by Praefect and giving the client only the illusion the repository was moved.

To facilitate both of the fixes, this MR adds repository_id into the schema. A repository ID uniquely identifies a repository in the cluster and doesn't have any other meaning. Praefect can then use the generated repository ID to map the repository on the disk to a unique path regardless of the client provided virtual storage and relative path. As the repository ID can be used to identify the replicas on the disks, the renames become atomic database updates. Praefect only needs to update the a mapping from (virtual_storage, relative_path) to repository_id. Each newly created repository gets a unique ID, so a newly created repository won't map to the same database or disk state as its ID is different from the deleted repository.

Uniqueness of the (virtual_storage, relative_path) will still be enforced and there can be only one repository that maps to a given combination and is still considered to exist.

The migration to repository ID's has to be split over three releases to ensure zero downtime upgrades:

  1. This first migration adds the columns in all relevant tables. Each logical repository in repositories table gets an ID generated for them. In the same release, we'll change the code to link any newly created repositories and replicas correctly via the repository id.
  2. The second release will contain a migration to connect any existing records via the repository id. Unupgraded Praefects creating records concurrently is fine as they already connect the records via the repository ID. Since we can now assume every record is connected via the repository id, we'll update every query to join the records based on repository_id rather than (virtual_storage, relative_path). The router will look up replica relative paths rather than assuming they are the client provided ones.
  3. As all queries are now joining via the repository_id, the replicas' relative path can now be decoupled from the logical repository's relative path. This release will hash the repository id to generate a unique relative path for each repository and store it. Unupgraded Praefect's are already at this point looking up the relative path to use, so this is backwards compatible.

This MR does the first release step. It adds the schema and makes sure new records in repositories, storage_repositories and repository_assignments link via the repository ID correctly. Replication jobs scheduled by the reconciler and the request finalizer are also updated to include the repository ID.

Related to #3485 (closed)

Edited by Sami Hiltunen

Merge request reports