Gitaly Cluster: data loss may occur when using 'repository_storage_moves' API to move project already in target storage
Fix
The fix for this issue is in
Upgrade to one of those patch levels, or higher.
Issue:
If a projects/:project_id/repository_storage_moves
API request to migrate a project into a Gitaly Cluster is sent for a project already stored in it, the project repository may be deleted.
I observed deletions occur with roughly 25% of requests when reproducing. The repo is deleted from all Gitaly nodes and removed from the Praefect DB when this occurs.
Steps to reproduce:
- Create a repository in the Gitaly Cluster storage
- Send
/projects/:project_id/repository_storage_moves
API calls to migrate the project to the cluster storage - The problem does not occur on every request, but after enough attempts the repo will be deleted from the Gitaly Cluster
- There is no way to retrieve the repo data other than backups once this occurs
Notes
This triggered a customer emergency when it happened in their production environment, causing the deletion of five of their projects.
I've reproduced this on v14.2.1 on my own instance, see attached Praefect logs. The impacted customer was on v13.12.8.
@proglottis found that this may be due to the ServerInfo
RPC returning an inconsistent filesystem_id
when called on a Gitaly Cluster storage. We only attempt to remove the old repo when it is on a different filesystem.
irb(main):005:0> c.storage_info
=> <Gitaly::ServerInfoResponse::StorageStatus: storage_name: "cluster", readable: true, writeable: true, fs_type: "EXT_2_3_4", filesystem_id: "ceef773b-dfb2-4560-a207-abbf5eb3e460", replication_factor: 3>
irb(main):006:0> c.storage_info
=> <Gitaly::ServerInfoResponse::StorageStatus: storage_name: "cluster", readable: true, writeable: true, fs_type: "EXT_2_3_4", filesystem_id: "324fdb95-0a00-493b-9406-ada66bc14de6", replication_factor: 3>
The inconsistent id issue was previously fixed under #2596 (closed).
This may have been introduced in v13.11 with !3302 (merged), where we now pick a Gitaly server at random, where previously we always went to the primary.
/cc @mjwood