Skip to content

Backfill repository_id and replica_path in existing database records

Sami Hiltunen requested to merge smh-link-repository-id into master
  • Praefect generates a repository ID to uniquely identify a repository. While new records that get created are already being linked via the repository ID, historical records in the database are still not linked via the repository ID. In order to update the queries to join the records via the repository ID, all of the relevant records need to be linked via the repository ID. This commit adds a migration that links the records in 'storage_repositories' and 'repository_assignments' to the 'repositories' table's records via the ID.

    Replication jobs are not included in the migration as they are not long lived and they can be rescheduled.

  • Praefect now generates a repository id that can be used to join the database records of a repository across the tables. As such, the virtual_storage and relative_path columns will not be needed in other tables than the 'repositories' table as the joining will happen via the repository_id column. As Praefect will begin generating unique relative paths for the replicas to avoid stale disk state of deleted repositories affecting recreation of said repositories, the replica_path column was added to the 'repositories' table to store the actual disk path of the replicas. Right now, every newly created 'repositories' record has the replica_path set to the relative_path. To begin using the new column, we also need the historical records to have the column correctly filled. This commit adds a migration that fills the replica_path of existing records. As each repository is currently stored in the path sent by the client, we'll just fill the column using that.

  • With repository ID present and backfilled in both 'storage_repositories' and 'repository_assignments' tables, Praefect can now start joining the records using the repository_id instead of (virtual_storage, relative_path). To prepare for that, this commit indexes both tables using (repository_id, storage). In a later release, (virtual_storage, relative_path) can be dropped from both tables and the primary key can be changed to use the new indexes.

Related to #3485 (closed)

Edited by Sami Hiltunen

Merge request reports