Add Ops feature flag to increase CI config fetch timeout

Summary

Add an Ops feature flag that increases the CI config fetch timeout (GITLAB_CI_CONFIG_FETCH_TIMEOUT_SECONDS) to 90 seconds for specific namespaces. This allows us to temporarily grant affected groups a higher timeout without requiring an infrastructure MR to change the environment variable.

Background

In #588313, we've been investigating intermittent Gitlab::Ci::Config::External::Context::TimeoutError failures during pipeline creation. The timeout was increased from the default 30s to 45s on GitLab.com via environment variable, but some customers with complex configurations (many includes, components, etc.) still hit the limit, especially during periods of Gitaly or infrastructure degradation.

In this comment, the idea of making the timeout controllable via an Ops feature flag was proposed. The original suggestion was three flags (for 60, 90, and 120 seconds), but after further discussion we agreed on a single Ops feature flag for 90 seconds.

Proposal

  • Add a single Ops-type feature flag (e.g. ci_config_fetch_timeout_override) that, when enabled for a namespace, overrides the CI config fetch timeout to 90 seconds.
  • The flag should use a group-level actor so it can be selectively enabled for affected customer namespaces via ChatOps.
  • This is intended as a temporary measure to provide relief to customers experiencing timeout errors while longer-term performance improvements (per-request Gitaly/HTTP timeouts, caching, variable calculation optimizations) are rolled out.

Why an Ops feature flag?

  • Changing GITLAB_CI_CONFIG_FETCH_TIMEOUT_SECONDS instance-wide requires an infrastructure MR and affects all projects, including those that would just block Sidekiq workers longer on genuine timeouts.
  • An Ops feature flag with a group-level actor lets us target specific namespaces that are known to be affected, without increasing the blast radius for the rest of GitLab.com.
  • It can be toggled quickly via ChatOps without a deploy.

Implementation notes

  • The timeout is currently read in lib/gitlab/ci/config.rb and the project context is available, so a group-scoped flag check should be straightforward (the project's root namespace can be used as the actor).
  • When the flag is enabled for a namespace, use 90 seconds as the timeout; otherwise, fall back to the existing GITLAB_CI_CONFIG_FETCH_TIMEOUT_SECONDS value (currently 45s on GitLab.com, 30s default).
  • #588313 - Request timed out when fetching configuration files (parent issue)
  • #590947 - ci_config_gitaly_timeout feature flag rollout
  • #590948 - ci_config_http_timeout feature flag rollout
Edited by 🤖 GitLab Bot 🤖