Skip to content

Add sharding key for packages_dependencies table

Description

Here were discussed several approaches of how to deal with the packages_dependencies table when organization is moved. It was concluded that adding project_id column, that will be used as a sharding key, to the packages_dependencies table at the cost of duplication is the option that we should consider.

Currently, the rows in the packages_dependencies table are shared between the packages. That needs to be changed that the dependencies are scoped to the project and shared only within one project.


Additionally the sharding key need to be set for the packages_dependencies table

Update https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/docs/packages_dependencies.yml to have:

allow_cross_foreign_keys:
  - gitlab_main_clusterwide
sharding_key:
  project_id: projects

Important All sharding keys must be not nullable or have a NOT NULL check constraint.

Approximate implementation plan

MR 1

  • Add the project_id column to the packages_dependencies table.
  • Change the uniqueness validation for the Packages::Dependency model and the unique database indexes.
    • Change existing validation to perform only if project_id isn't present.
    • Add a new uniqueness validation for name, scoped to %i[version_pattern project_id] if project_id is present.
    • Delete the existing unique index index_packages_dependencies_on_name_and_version_pattern.
    • Add the new unique index on name, version_pattern WHERE project_id IS NULL.
    • Add the new unique index name, version_pattern, project_id WHERE project_id IS NOT NULL.
  • Change the Packages::CreateDependencyService
    • Set project_id along with other attributes when creating a new dependency.

MR 2

  • Add background migration to backfill the project_id for existing rows and to create new entries in packages_dependencies table for uniq name, version_pattern, project_id. Here is the draft of the migration, that most likely should be optimized for doing bulk operations:

    Packages::DependencyLink.each_batch do |batch|
      batch.find_each do |dependency_link|
        dependency = dependency_link.dependency
        project_id = dependency_link.project_id
    
        if dependency.project_id
          new_dependency = Packages::Dependency.create(project_id: project_id, name: dependency.name, version_pattern: dependency.version_pattern)
          dependency_link.update!(dependency_id: new_dependency.id)
        else
          dependency.update!(project_id: project_id)
        end
      end
    end

MR 3

  • Change the Packages::CreateDependencyService
    • Adjust the logic to re-use dependencies of the same project.
  • Change the project_id column of the projects_dependencies to be NOT NULL docs.
  • Remove old UNIQUE index name, version_pattern WHERE project_id IS NULL.
  • Update packages_dependencies.yml to reference project_id as a sharding key.
Edited by Dzmitry (Dima) Meshcharakou