Add ci_config_fetch_timeout_override ops feature flag
What does this MR do and why?
Adds an Ops-type feature flag ci_config_fetch_timeout_override that overrides the CI config fetch timeout to 90 seconds when enabled for a specific namespace.
Some customers with complex CI configurations (many includes, components, etc.) intermittently hit the cumulative TimeoutError during pipeline creation, especially during periods of Gitaly or infrastructure degradation. The current timeout (GITLAB_CI_CONFIG_FETCH_TIMEOUT_SECONDS) is set instance-wide via environment variable, and changing it requires an infrastructure MR that affects all projects.
This feature flag allows SREs to selectively increase the timeout for affected customer namespaces via ChatOps without a deploy, and without increasing the blast radius for the rest of GitLab.com:
/chatops run feature set --namespace=<namespace_id> ci_config_fetch_timeout_override true
References
- Closes Add Ops feature flag to increase CI config fetc... (#594604 - closed)
- Parent issue: Request timed out when fetching configuration f... (#588313)
Feature flag
Name: ci_config_fetch_timeout_override
Type: ops (namespace actor)
How to set up and validate locally
- Enable the feature flag for a namespace:
namespace = Namespace.find(<id>) Feature.enable(:ci_config_fetch_timeout_override, namespace) - Create a pipeline for a project under that namespace and verify the config fetch timeout is set to 90 seconds (observable via logs or by stubbing in a test)
- Disable the flag and verify the default
TIMEOUT_SECONDSis used:Feature.disable(:ci_config_fetch_timeout_override, namespace)
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.