Skip to content

Configurable online GC review delay

Context

As described in the spec, online GC relies on a set of database triggers and functions which take care of queueing blobs and manifests for review.

Problem

By default, the review of these blobs and manifests is set to one day ahead of the queueing time (using the review_after column) and this is not configurable:

Function/trigger Insert in queue review_after
gc_track_blob_uploads gc_blob_review_queue Default (1 day), incremented by another 1 day in case of conflict.
gc_track_deleted_manifests gc_blob_review_queue Default (1 day), incremented by another 1 day in case of conflict.
gc_track_deleted_layers gc_blob_review_queue Default (1 day), incremented by another 1 day in case of conflict.
gc_track_manifest_uploads gc_manifest_review_queue Default (1 day).
gc_track_deleted_manifest_lists gc_manifest_review_queue Default (1 day), incremented by another 1 day in case of conflict.
gc_track_deleted_tags gc_manifest_review_queue Default (1 day), incremented by another 1 day in case of conflict.
gc_track_switched_tags gc_manifest_review_queue Default (1 day), incremented by another 1 day in case of conflict.

Considering this:

  1. As we move through the gradual rollout of the new registry with online GC for GitLab.com, gradually increasing the load on the application, we may find ourselves in need of adjusting the default value for review_after. Frequently changing the default value of review_after (alter table) with a database migration may be problematic.

  2. As we gain additional insight on how online GC performs under load, we may also find it useful to fine-tune the default review_after individually for each artifact (blob or manifest) and operation (queued in response to a manifest delete, a tag delete, etc.) pair. Having to drop and recreate the online GC functions to use a non-default value for review_after may be problematic.

  3. Last but not least, for QA tests, it would be useful to have no review delay. If we could "disable" it (set the default review_after to NOW()) using an application/environment configuration, we could perform a series of API requests that would let us validate the online GC behavior. For example, we could upload a series of images, and then delete all tags for a few of them. After some seconds, online GC should have removed the dangling images and we could assert that by trying to pull them, which should fail. Currently, there is no way we could do this.

Possible solution

For 1 and 2, we could probably have a gc_settings table (similar to how GitLab Rails uses application_settings). There we could have a column for the default delay (either a single one or multiple, per artifact/operation). Each function would then source the proper review_after to use from this table. A regular database migration would fill this table with default values. To change them we would need a database migration (simple update) as well, or:

For problem 3 and as a possible addition for 1 and 2 as well, we could read the desired review delay settings from the application configuration file at boot time and update the gc_settings table accordingly (if any custom values were set), or leave it with its default values. This can create some concurrency problems in clustered environments, as we would have several instances trying to do the same operation on gc_settings. To tackle this we could either use a lock mechanism or use a randomized jitter before the update operation, letting the "last write win".

Edited by João Pereira