How and when to verify repositories on the secondary?
As outlined in gitlab-org/gitlab-ee#4469, we want to verify repositories on the secondary match the repositories on the primary.
How do we do that on the secondary?
- repository checksum communicated via the DB
- use the last repository sync time to determine when to verify?
- once a repository has been quiet for a period of time? every night?
- what happens if the checksum doesn't match - how is that tracked and communicated?
- what about highly active projects? "Those aren’t normally a problem, because we would try to sync those anyway and a git pull on the secondary should quickly pick up inconsistencies."
/cc @stanhu @dbalexandre @toon
Initial Proposal
Assumption: a project's checksum is stored with the project in the database, and gets replicated into the secondary.
Each time a project is synced on the secondary, it's checksum is cleared in the ProjectRegistry
.
Once a day, a job kicks off that scans the ProjectRegistry
looking for projects that have not been checksumed.
If it's stable (project has not been synced in 6 hours and the project has a recently computed checksum), then compute the checksum and verify it matches the main project checksum.
If the checksum fails, mark it as failed in the ProjectRegistry
.
Enhance the API GET /geo_nodes/current/failures
to return the failures.
Possible new columns for the ProjectRegistry
repository_checksum
last_repository_verification_failure
last_repository_verification_at
wiki_checksum
last_wiki_verification_failure
last_wiki_verification_at
Related Issues/Epics: &58 (closed), #4746 (closed), #4755 (closed), #4756 (closed)