Ensure Package File checksum on primary

Problem to solve

We need the primary to ensure checksums are generated for all new and existing package files so that the secondary can check if its file matches.

Intended users

  • Systems administrators

To do

I've been using !45962 (closed) as a dev MR.

  • !45695 (merged)
  • !45964 (merged)
  • !46998 (merged)
  • !47361 (merged)
  • !47260 (merged)
  • !47372 (merged) Add background jobs to get things that need verification and backfill them up to verification capacity
    • Be careful to properly handle concurrency
    • Reuse a component that was added recently by another team to manage backfill work
  • !47372 (merged) Add appropriate DB indexes
    • already added index on verification_state, which should be a good base
    • for queries used to find checksums that need to be backfilled
    • for queries used to produce counts
  • !48006 (merged) Recover rows that started X hours ago (set to failed) (open follow up issue if this is more than weight 1)
  • !49146 (merged) Documentation

Not a blocker for this issue:

  • !49292 (merged)

Permissions and Security

N/A

Documentation

https://docs.gitlab.com/ee/administration/geo/replication/#limitations-on-replicationverification needs to be updated

Testing

TBD

What does success look like, and how can we measure that?

  • Every Package file record on the primary will eventually have a checksum saved. "Eventually" depends on the hard-coded rate limit, how many unchecksummed Package files there are, and the rate of creation of new Package files that are too big to be checksummed synchronously.

What is the type of buyer?

  • Premium
  • Ultimate

Links / references

Edited Dec 09, 2020 by Michael Kozono
Assignee Loading
Time tracking Loading