Skip to content
GitLab
Next
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • gitaly gitaly
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 561
    • Issues 561
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 56
    • Merge requests 56
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • gitalygitaly
  • Issues
  • #3752
Closed
Open
Issue created Aug 26, 2021 by Will Chandler@wchandlerMaintainer

Gitaly Cluster: data loss may occur when using 'repository_storage_moves' API to move project already in target storage

Fix

The fix for this issue is in

  • 14.3.0
  • 14.2.4
  • 14.1.6
  • 14.0.11
  • 13.12.12

Upgrade to one of those patch levels, or higher.

Issue:

If a projects/:project_id/repository_storage_moves API request to migrate a project into a Gitaly Cluster is sent for a project already stored in it, the project repository may be deleted.

I observed deletions occur with roughly 25% of requests when reproducing. The repo is deleted from all Gitaly nodes and removed from the Praefect DB when this occurs.

Steps to reproduce:

  1. Create a repository in the Gitaly Cluster storage
  2. Send /projects/:project_id/repository_storage_moves API calls to migrate the project to the cluster storage
  3. The problem does not occur on every request, but after enough attempts the repo will be deleted from the Gitaly Cluster
  • There is no way to retrieve the repo data other than backups once this occurs

Notes

This triggered a customer emergency when it happened in their production environment, causing the deletion of five of their projects.

I've reproduced this on v14.2.1 on my own instance, see attached Praefect logs. The impacted customer was on v13.12.8.

@proglottis found that this may be due to the ServerInfo RPC returning an inconsistent filesystem_id when called on a Gitaly Cluster storage. We only attempt to remove the old repo when it is on a different filesystem.

irb(main):005:0> c.storage_info
=> <Gitaly::ServerInfoResponse::StorageStatus: storage_name: "cluster", readable: true, writeable: true, fs_type: "EXT_2_3_4", filesystem_id: "ceef773b-dfb2-4560-a207-abbf5eb3e460", replication_factor: 3>
irb(main):006:0> c.storage_info
=> <Gitaly::ServerInfoResponse::StorageStatus: storage_name: "cluster", readable: true, writeable: true, fs_type: "EXT_2_3_4", filesystem_id: "324fdb95-0a00-493b-9406-ada66bc14de6", replication_factor: 3>

The inconsistent id issue was previously fixed under #2596 (closed).

This may have been introduced in v13.11 with !3302 (merged), where we now pick a Gitaly server at random, where previously we always went to the primary.

/cc @mjwood

Edited Oct 05, 2021 by Ben Prescott @bprescott_↙ ☺
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking