Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
gitaly gitaly
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 676
    • Issues 676
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 36
    • Merge requests 36
  • Requirements
    • Requirements
    • List
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI/CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.org
  • gitalygitaly
  • Issues
  • #2717

Closed
Open
Created Apr 30, 2020 by James Ramsay (ex-GitLab)@jramsay-gitlab🔴Contributor

Automatically repair repositories

Problem to solve

When a primary goes offline, if there are changes that haven't been replicated, those repositories will be marked as read only #2630 (closed) to prevent data loss. If the node comes back online, GitLab doesn't automatically recover, nor provides admins with an way to recover besides accepting the data loss.

Praefect could failover in error when the primary was only experiencing temporary failures, and put a large number of repositories into read only mode. This would be frustrating for an SRE to address because they'd have rather just too 30 seconds of outage, than have a failover and thousands of projects marked as read only and no obvious way to resolve the situation.

Repository on a secondary node might get out of sync if a replication job fails to be processed. The repository stays outdated until another write comes in triggering a replication job.

Proposal

When Praefect notices an outdated repository and a more up to date one is present on an available node, Praefect schedules a replication job to bring the outdated repository up to date again.

Links / references

Edited Jul 07, 2020 by Sami Hiltunen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
13.4
Milestone
13.4 (Past due)
Assign milestone
Time tracking