Ensure Package File checksum on primary
Problem to solve
We need the primary to ensure checksums are generated for all new and existing package files so that the secondary can check if its file matches.
Intended users
- Systems administrators
To do
I've been using !45962 (closed) as a dev MR.
-
!45695 (merged) -
!45964 (merged) -
!46998 (merged) -
!47361 (merged) -
!47260 (merged) -
!47372 (merged) Add background jobs to get things that need verification and backfill them up to verification capacity -
Be careful to properly handle concurrency -
Reuse a component that was added recently by another team to manage backfill work
-
-
!47372 (merged) Add appropriate DB indexes - already added index on
verification_state
, which should be a good base - for queries used to find checksums that need to be backfilled
- for queries used to produce counts
- already added index on
-
!48006 (merged) Recover rows that started X hours ago (set to failed) (open follow up issue if this is more than weight 1) -
!49146 (merged) Documentation
Not a blocker for this issue:
Permissions and Security
N/A
Documentation
https://docs.gitlab.com/ee/administration/geo/replication/#limitations-on-replicationverification needs to be updated
Testing
TBD
What does success look like, and how can we measure that?
- Every Package file record on the primary will eventually have a checksum saved. "Eventually" depends on the hard-coded rate limit, how many unchecksummed Package files there are, and the rate of creation of new Package files that are too big to be checksummed synchronously.
What is the type of buyer?
- Premium
- Ultimate
Links / references
Edited by Michael Kozono