Add ci_config_fetch_timeout_override ops feature flag

What does this MR do and why?

Adds an Ops-type feature flag ci_config_fetch_timeout_override that overrides the CI config fetch timeout to 90 seconds when enabled for a specific namespace.

Some customers with complex CI configurations (many includes, components, etc.) intermittently hit the cumulative TimeoutError during pipeline creation, especially during periods of Gitaly or infrastructure degradation. The current timeout (GITLAB_CI_CONFIG_FETCH_TIMEOUT_SECONDS) is set instance-wide via environment variable, and changing it requires an infrastructure MR that affects all projects.

This feature flag allows SREs to selectively increase the timeout for affected customer namespaces via ChatOps without a deploy, and without increasing the blast radius for the rest of GitLab.com:

/chatops run feature set --namespace=<namespace_id> ci_config_fetch_timeout_override true

References

Feature flag

Name: ci_config_fetch_timeout_override Type: ops (namespace actor)

How to set up and validate locally

  1. Enable the feature flag for a namespace:
    namespace = Namespace.find(<id>)
    Feature.enable(:ci_config_fetch_timeout_override, namespace)
  2. Create a pipeline for a project under that namespace and verify the config fetch timeout is set to 90 seconds (observable via logs or by stubbing in a test)
  3. Disable the flag and verify the default TIMEOUT_SECONDS is used:
    Feature.disable(:ci_config_fetch_timeout_override, namespace)

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sahil Sharma

Merge request reports

Loading