Restore missing container repositories under existing projects (part 1/2)
Context
This is related to Restore missing container repositories under ex... (&9619). The intent is to perform a data repair to restore missing container repositories under existing projects.
The high-level strategy for the data repair is described here. The actual implementation plan was detailed here and split into two parts. This issue is for the implementation of part 1/2.
Implementation
Requirements
To make this happen we'll need a few assets:
-
New temporary table with columns
project_id(FK forprojects),missing_count(int),status(text), andupdated_at. For brevity, we'll refer to this table ast. -
A limited capacity worker to perform the data repair analysis.
-
An application setting to control the max concurrency for the worker (default to
2). -
A feature flag to enable/disable the worker execution.
Logic
The background job should do the following work:
-
"Loop over" (cron scheduling) all projects that do not appear in
t(i.e. skip those that were already analyzed); -
For each project
P:-
Query the container registry for the list of non-empty (at least one tag) repositories under
P's full path. This should be done by calling the new List Sub Repositories API. -
For each repository
Rin the returned list (paginated response):-
Check if
Rexists on the Rails side (container_repositoriestable); -
If it is missing, increment a counter of "missing repositories" for
P.
-
-
Once done iterating over repositories under
P, insert a row intforP.t.missing_countshould be set to the value of the above counter.Note: As we'll be looping over all
projects(millions of rows) and inserting a record for each int(same quantity), it can be advisable to perform a bulk insert. In this case, we can stash inserts for up toNPs and only then flush them to the database. However, because we'll be doing1+Nnetwork requests to the registry for eachP, we must ensure that we flush any stashed inserts in case an exception occurs (e.g. network timeout). Otherwise, when the worker resumes it will pickPs that were already analyzed but not recorded due to a previous failure.
-
t.missing_count will allow us to:
-
Identify how many missing repositories were found per project and in total. This will be used to assess the scale of the problem and fine-tune the approach for part 2/2 (the actual data repair);
-
Act as the filter for projects so that we can narrow down the data repair loop in part 2/2 to repositories whose
t.missing_count > 0.