Figure out how Geo should verify if all objects are correct
In gitlab-org/gitlab-ee#4469 we are investigating how to ensure repository on the secondary matches the repository on the primary. In this issue we'll focus on checking the integrity of the objects itself (and not the refs).
We might want to use git fsck
to verify that all objects in a repository are good.
But this raises some questions:
- What does
git fsck
actually do? - What does it mean if there are bad objects? Can we still clone the repo?
- Does a clone also clone the bad objects?
And how should Geo respond to bad objects? It depends on the case:
- Primary good, Secondary good → GOOD
- Primary bad, Secondary good → GOOD for now
- Primary good, Secondary bad → Flag this as a bad repository
- Primary bad, Secondary bad → Everyone is bad
The most interesting case (at first) might be "Primary good, Secondary bad". How will we recover from this?
Action plan
- Figure out why is the repository check not doing anything on gitlab.com: gitlab-org/gitlab-ce#45046
- Run git fsck on Geo secondary: gitlab-org/gitlab-ee#5564
- Avoid running
git fsck
when the repository does not yet exist. This might share info with repo checksumming (issue TBD) - Figure out how we can use the output of
git fsck
in a more meaningful way:- count the number of objects of each type?
- ???
References
Edited by Toon Claes