Figure out how to do a bulk checksum for all projects in database

Description

In gitlab-org/gitlab-ee#4469 we are investigating how to ensure repository on the secondary matches the repository on the primary. In this issue, we'll focus on how to calculate the initial checksum for all projects in the database.

Proposal

  1. Use the shelling out approach https://gitlab.com/gitlab-org/gitlab-ee/issues/4755#note_57723358 restricted to refs/heads, and refs/tags to calculate the checksum

  2. Create a new table project_state with the following columns:

  3. project_id

  4. repository_checksum

  5. last_repository_check_at

  6. wiki_checksum

  7. last_wiki_check_at

  8. Create a background migration to backfill this table:

  9. Start with less active projects

  10. Updates e.g. 1000 rows per job with a 5-minute interval (?)

  11. Create a background job that will be triggered once a day:

  12. Scans the projects table for recently update projects e.g. last 24 hours

  13. Update the checksum

/cc @stanhu @toon @digitalmoksha

Edited by Douglas Barbosa Alexandre