Geo: Gitaly update causes checksum mismatch
I was doing some testing for https://gitlab.com/gitlab-org/gitlab-ee/issues/7619 and I have a number of projects with a verification mismatch.
I told the secondary to resync those, but the verification kept failing.
With some research in the Rails console, I discovered the checksum on the primary was wrong:
pry(main)> project = Project.find(8)
=> #<Project id:8 h5bp/html5-boilerplate>
pry(main)> project.repository_state
=> #<ProjectRepositoryState:0x00005560f1cffa10
id: 7,
project_id: 8,
repository_verification_checksum: "b435b48b02d20aa16e41bc91c5dc6eabe9054bb9",
wiki_verification_checksum: "0000000000000000000000000000000000000000",
last_repository_verification_failure: nil,
last_wiki_verification_failure: nil,
repository_retry_at: nil,
wiki_retry_at: nil,
repository_retry_count: nil,
wiki_retry_count: nil>
pry(main)> project.repository.checksum
=> "dcca96d0db518ed749de4f580d33a5f0949318f4"
So sync on the secondary would never succeed.
Cause
I think the initial checksum on the primary was calculated with an old Gitaly ---one not including the default branch (see gitaly!731 (merged))--- and now the secondary calculates it with the default branch (and also the primary in the console).
Proposed solutions
1. Periodically reverify
As discussed in https://gitlab.com/gitlab-org/gitlab-ee/issues/7347
2. Checksum version number
When we did the Gitaly update, we could have added a checksum_version
column and bump that to indicate the algorithm used to calculate was different
3. Make the secondary tell the primary to reverify
In case the secondary sees reoccuring mismatches, it might be an option to have an API endpoint on the primary, that the secondary can hit, to force the primary to recalculate it's checksum.
4. Button in admin panel of the primary
Add a button at the primary to recalculate checksum. See https://gitlab.com/gitlab-org/gitlab-ee/issues/7617