Figure out how to do a bulk checksum for all projects in database
Description
In gitlab-org/gitlab-ee#4469 we are investigating how to ensure repository on the secondary matches the repository on the primary. In this issue, we'll focus on how to calculate the initial checksum for all projects in the database.
Proposal
-
Use the shelling out approach https://gitlab.com/gitlab-org/gitlab-ee/issues/4755#note_57723358 restricted to
refs/heads
, andrefs/tags
to calculate the checksum -
Create a new table
project_state
with the following columns: -
project_id
-
repository_checksum
-
last_repository_check_at
-
wiki_checksum
-
last_wiki_check_at
-
Create a background migration to backfill this table:
-
Start with less active projects
-
Updates e.g. 1000 rows per job with a 5-minute interval (?)
-
Create a background job that will be triggered once a day:
-
Scans the
projects
table for recently update projects e.g. last 24 hours -
Update the checksum