Skip to content

Implementation plan to restore missing repositories under existing projects

Context

This is part of Container Registry: Storage usage data repair (&9107). Please read its description for context.

Task

Create a rails backend implementation plan for a data repair to restore missing container repositories under existing projects. The high-level strategy is described here. The referred registry changes are already implemented. Now we need to devise a detailed implementation plan for the Rails background jobs taking into account the described requirements.

Plan

We can split this data repair into two parts:

  • Part 1/2: Loop over all projects with the container registry feature enabled. For each, identify if it needs a data repair (i.e., there are repositories on the registry side for this project, but none on the Rails side) and record the result as a boolean flag;

  • Part 2/2: Loop over projects marked as needing a data repair. For each, re-create missing repositories on the Rails side.

By splitting the two we're able to start small and focus on identifying the scale of our problem in (1) and only then fine-tune the plan if needed and proceed with the actual data repair in (2).

To simplify, we can focus on projects that currently have the registry feature enabled. As we're going to flag projects that 1) need a data repair and then 2) were repaired, we can worry about the remaining later on if/when needed.

Note: Here we're not worried about repositories that may exist on the Rails side but not on the registry side. For the purpose of this data repair (storage usage calculations), we're only worried about those that exist on the registry (actually taking storage space) but are unknown to Rails (so the user doesn't see them).

Part 1/2

The process should be as follows:

  1. Loop over all projects that 1) have the container registry feature enabled and 2) were not already analyzed *;

  2. For each project P:

    1. Query the container registry for the list of non-empty (at least one tag) repositories under P's full path. This should be done by calling the new List Sub Repositories API.

    2. For each repository R in the returned list:

      1. Check if R exists on the Rails side (container_repositories table);

      2. If it is missing, increment a counter of "missing repositories" for P.

    3. Once done iterating, log the number of missing repositories under R;

    4. Flag P as "needs registry data repair" or "does not need registry data repair" (on the database), depending on whether missing repositories were found.

By looking at the "registry data repair" flag for each project we're able to tell which ones have already been analyzed *. This allows us to 1) track progress and 2) resume the "loop" over repositories in case of failure. Later on, it will allow us to actually repair these projects.

Low-Level Details

As highlighted in &9107 (comment 1154773359), it is probably best to not implement the process above using background migrations. We're likely to run into a widely variable processing time for each project, which ultimately depends on the number of existing/missing repositories. With this long runtime comes the increased chances of facing errors and interruptions.

For this reason, and to possibly allow reusing/extending the implementation of part 1/2 for the actual data repair (part 2/2), we'd be better off using a dedicated background job, and more precisely a limited capacity worker.

We should place the execution of this background job behind a feature flag. The max concurrency for this job should be configurable, placed behind an application setting and default to 2.

Part 2/2

Note: We should revisit the plan here once we know the scale (number of missing repositories) after part 1/2.

The process should be as follows:

  1. Loop over all projects that 1) have the container registry feature enabled 2) were marked as "needs registry data repair" in part 1/2 and 3) were not yet repaired;

  2. For each project P:

    1. Query the container registry for the list of non-empty (at least one tag) repositories under P's full path. This should be done by calling the new List Sub Repositories API.

    2. For each repository R in the returned list:

      1. Check if R exists on the Rails side (container_repositories table);

      2. If it is missing, re-create it on the Rails side.

    3. Flag P as "registry data repaired" (timestamp).

By looking at the "registry data repaired" flag we'll be able to track progress and resume the "loop" in case of failure.

As with part 1/2, this is best implemented as a background job. Similarly, we should place this behind a global feature flag toggle, and its max concurrency should be configurable using an application setting (can be the same as for part 1/2).

Edited by João Pereira