Skip to content
GitLab
Next
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • gitaly gitaly
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 561
    • Issues 561
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 56
    • Merge requests 56
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • gitalygitaly
  • Issues
  • #2717
Closed
Open
Issue created Apr 30, 2020 by James Ramsay (ex-GitLab)@jramsay-gitlab🔴Contributor

Automatically repair repositories

Problem to solve

When a primary goes offline, if there are changes that haven't been replicated, those repositories will be marked as read only #2630 (closed) to prevent data loss. If the node comes back online, GitLab doesn't automatically recover, nor provides admins with an way to recover besides accepting the data loss.

Praefect could failover in error when the primary was only experiencing temporary failures, and put a large number of repositories into read only mode. This would be frustrating for an SRE to address because they'd have rather just too 30 seconds of outage, than have a failover and thousands of projects marked as read only and no obvious way to resolve the situation.

Repository on a secondary node might get out of sync if a replication job fails to be processed. The repository stays outdated until another write comes in triggering a replication job.

Proposal

When Praefect notices an outdated repository and a more up to date one is present on an available node, Praefect schedules a replication job to bring the outdated repository up to date again.

Links / references

Edited Jul 07, 2020 by Sami Hiltunen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking