Skip to content

Gitaly Cluster: data loss may occur when using 'repository_storage_moves' API to move project already in target storage

Fix

The fix for this issue is in

Upgrade to one of those patch levels, or higher.

Issue:

If a projects/:project_id/repository_storage_moves API request to migrate a project into a Gitaly Cluster is sent for a project already stored in it, the project repository may be deleted.

I observed deletions occur with roughly 25% of requests when reproducing. The repo is deleted from all Gitaly nodes and removed from the Praefect DB when this occurs.

Steps to reproduce:

  1. Create a repository in the Gitaly Cluster storage
  2. Send /projects/:project_id/repository_storage_moves API calls to migrate the project to the cluster storage
  3. The problem does not occur on every request, but after enough attempts the repo will be deleted from the Gitaly Cluster
  • There is no way to retrieve the repo data other than backups once this occurs

Notes

This triggered a customer emergency when it happened in their production environment, causing the deletion of five of their projects.

I've reproduced this on v14.2.1 on my own instance, see attached Praefect logs. The impacted customer was on v13.12.8.

@proglottis found that this may be due to the ServerInfo RPC returning an inconsistent filesystem_id when called on a Gitaly Cluster storage. We only attempt to remove the old repo when it is on a different filesystem.

irb(main):005:0> c.storage_info
=> <Gitaly::ServerInfoResponse::StorageStatus: storage_name: "cluster", readable: true, writeable: true, fs_type: "EXT_2_3_4", filesystem_id: "ceef773b-dfb2-4560-a207-abbf5eb3e460", replication_factor: 3>
irb(main):006:0> c.storage_info
=> <Gitaly::ServerInfoResponse::StorageStatus: storage_name: "cluster", readable: true, writeable: true, fs_type: "EXT_2_3_4", filesystem_id: "324fdb95-0a00-493b-9406-ada66bc14de6", replication_factor: 3>

The inconsistent id issue was previously fixed under #2596 (closed).

This may have been introduced in v13.11 with !3302 (merged), where we now pick a Gitaly server at random, where previously we always went to the primary.

/cc @mjwood

Edited by Ben Prescott_
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information