Org Migration: Design how to handle shards and its dependent tables

Background

See also https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/cells/impacted_features/git-access

The shards table holds the name of a shard / aka Gitaly storage.

In #498991 (closed), the table was declared gitlab_main_cell.

This shards table is referenced by 4 other tables, all of which are gitlab_main_cell:

  • group_wiki_repositories
  • pool_repositories
  • project_repositories
  • snippet_repositories
Click to expand entity relationship diagram of these tables

Mermaid Playground

erDiagram
    shards {
        int id PK
        string name "Name of Gitaly storage/shard"
    }
    
    group_wiki_repositories {
        int group_id PK,FK
        int shard_id FK
        string disk_path
    }
    
    "project_wiki_repositories delegates repository_storage to projects" {
        int id PK
        int project_id FK
    }
    
    pool_repositories {
        int id PK
        int shard_id FK
        string disk_path
        int source_project_id
    }
    
    project_repositories {
        int id PK
        int project_id FK
        int shard_id FK
        string disk_path
    }
    
    snippet_repositories {
        int snippet_id PK,FK
        int shard_id FK
        string disk_path
    }
    
    projects {
        int id PK
        string repository_storage "Name of Gitaly storage/shard"
        int pool_repository_id FK "Optional"
    }

    snippets {
        int id PK
    }

    namespaces {
        int id PK
    }
    
    group_wiki_repository_states {
        int id PK
        int group_wiki_repository_id FK
        string verification_state
    }
    
    project_states {
        int id PK
        int project_id FK
        string verification_state
    }
    
    wiki_repository_states {
        int id PK
        int project_wiki_repository_id FK
        int project_id FK
        string verification_state
    }
    
    snippet_repository_states {
        int id PK
        int snippet_repository_id FK
        string verification_state
    }
    
    shards ||--o{ group_wiki_repositories : "has many"
    shards ||--o{ pool_repositories : "has many"
    shards ||--o{ project_repositories : "has many"
    shards ||--o{ snippet_repositories : "has many"
    
    pool_repositories ||--o{ projects : "used by"
    
    group_wiki_repositories ||--o| group_wiki_repository_states : "has one"
    project_repositories ||--o| project_states : "has one"
    "project_wiki_repositories delegates repository_storage to projects" ||--o| wiki_repository_states : "has one"
    snippet_repositories ||--o| snippet_repository_states : "has one"
    
    projects ||--o| project_repositories : "has one"
    projects ||--o| "project_wiki_repositories delegates repository_storage to projects" : "has one"
    snippets ||--o| snippet_repositories : "has one"
    namespaces ||--o| group_wiki_repositories : "has one"

Problem

When the dependent tables are moved by Org Mover, how should we handle the shards table ?

Click to expand example

Screenshot_2024-12-11_at_5.55.27_PM

-- source

Each Cell will have its own unique Gitaly storages. By definition, the shards table will need to reflect these unique Gitaly storages.

After Org Mover moves a particular Gitaly repository from Cell A to Cell B, it will need to update the shard_id column

Proposal

  • Investigate and decide how Org Mover will operate during replication and during cutover
  • Investigate and decide if there's any application changes we need to make to the Rails monolith

Decision

Acceptance Criteria

Edited by Michael Kozono