Add timeout support in Projects::ContainerRepository::DeleteTagsService#execute
🤔 What does this MR do?
MR 1 of #208193 (comment 362910703) MR plan:
Put a timeout around DeleteTagsService#execute but only under those conditions:
- The GitLab Registry is in use. (This is what gitlab.com uses)
- The feature flag
container_registry_expiration_policies_throttling
is enabled - Application Setting
container_registry_delete_tags_service_timeout
set to anything else thannil
or0
⚔ Design choices
- For a better understanding how this service is used, see the schema in #208193 (comment 362910703).
- This service function is really simple: it receives a list of tags, it will loop on them and sending the appropriate request to the Container Registry to delete them.
- The main issue is that these requests to the Container Registry can be really slow.
- We can easily imagine what happens when the service receives let's say 100K tags to delete. Well, let's summarize it with
🐢 - One upcoming feature is having Container Registry Cleanup Policies open to all projects. The policies are executed (with a background worker) under a given schedule and each execution will use this service to delete a given list of tags.
- To avoid resources starvation, this MR puts an execution time limit to this service.
- When the limit is hit, the loop is stopped.
- As such, we can have a partial execution which is an accepted outcome.
- The default value of 250 seconds has been selected so that we are way below from the 300 seconds threshold that is applied for background workers (see gitlab-com/gl-infra/scalability#461 (comment 370427582))
- To have a better control of this execution time limit value, it is implemented as an application setting (
container_registry_delete_tags_service_timeout
)- The corresponding UI has been added to the container settings in the admin UI (located at
/admin/application_settings/ci_cd
) - This way, we can update this value directly from the UI
- The corresponding UI has been added to the container settings in the admin UI (located at
- As stated above, this MR is part of a larger effort to limit resources usage during Container Registry Cleanup Policies execution.
- This being a deep change from the current behavior, all the limits we are implementing in the larger effort are gated behind a feature flag:
container_registry_expiration_policies_throttling
. This provides us a safety switch between non throttled executions (current implementation) <-> throttled executions. This switch can be used in the case of a production issue. - This MR being the first of the larger effort, it is the one introducing the feature flag.
- This being a deep change from the current behavior, all the limits we are implementing in the larger effort are gated behind a feature flag:
- To be complete in this MR, the corresponding documentation has been updated with a
Caution
block to indicate that Cleanup Policies can be partially executed.
🍪 Database Migrations
Up
$ rails db:migrate:up VERSION=20200710113437
== 20200710113437 AddContainerRegistryDeleteTagsServiceTimeoutToApplicationSettings: migrating
-- add_column(:application_settings, :container_registry_delete_tags_service_timeout, :integer, {:default=>250, :null=>false})
-> 0.0054s
== 20200710113437 AddContainerRegistryDeleteTagsServiceTimeoutToApplicationSettings: migrated (0.0054s)
Down
$ rails db:migrate:down VERSION=20200710113437
== 20200710113437 AddContainerRegistryDeleteTagsServiceTimeoutToApplicationSettings: reverting
-- remove_column(:application_settings, :container_registry_delete_tags_service_timeout)
-> 0.0165s
== 20200710113437 AddContainerRegistryDeleteTagsServiceTimeoutToApplicationSettings: reverted (0.0166s)
📸 Screenshots
The admin UI:
🛃 Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by David Fernandez