Enable checksumming of all data
Summary
Geo replicates data from a primary site to one or more secondary sites. We ensure that all data transferred from a primary to the secondary is automatically verified after it is transferred so users can be sure that no data was corrupted during the transfer. The Geo verification mechanism ensures that the transferred data matches a calculated checksum. If the checksum of the data on the primary site matches the checksum of the data on the secondary site, the data has been transferred successfully.
Problem to solve
Both Geo, Cells Mover, and any other GitLab backup solution replicate or copy data from one place to another. This can happen via different mechanisms, but it results in different copies of the data being stored in different places. During the process, files may get corrupted, and this can cause data loss when a Geo secondary site is promoted during a failover, an organization is moved to another cell, or a backup file is restored. So, data integrity is essential to all of these solutions, but we only enable the checksumming when Geo is enabled on the primary site. To solve this problem, we need to enable the checksumming of all data on a GitLab instance, whether Geo is enabled or not.
Proposal
- Enable checksumming of all data on any primary GitLab instances
- Add or update current rake tasks to show the verification status. We can work on a UI to show this information to administrators later in a follow-up issue.