Add per-request Gitaly timeout for CI config file fetching

What does this MR do and why?

To reduce the blast radius during infrastructure incidents, the proposal is to introduce static per-request timeouts for individual external calls during CI config fetching. This won't prevent pipeline failures during incidents, but it would make them fail fast instead of holding Sidekiq workers for up to 30 seconds, reduce resource consumption, give users quicker feedback to retry, and make failures easier to correlate with infrastructure metrics.

Introduce a configurable timeout (GITLAB_CI_CONFIG_GITALY_TIMEOUT_SECONDS) for Gitaly calls when fetching CI configuration files. This addresses pipeline creation hanging when Gitaly experiences slowness.

The timeout applies to local, project, and component includes via a thread-local timeout wrapper that propagates to Gitaly client calls. Controlled by the ci_config_gitaly_timeout feature flag (#590947).

References

Related to #588313

Screenshots or screen recordings

  1. Make changes to simulate a slowness
--- a/internal/gitaly/service/blob/get_blobs.go
+++ b/internal/gitaly/service/blob/get_blobs.go
@@ -5,6 +5,7 @@ import (
        "context"
        "errors"
        "io"
+       "time"

        "gitlab.com/gitlab-org/gitaly/v18/internal/git"
        "gitlab.com/gitlab-org/gitaly/v18/internal/git/catfile"
@@ -155,6 +156,8 @@ func sendBlobTreeEntry(
 }

 func (s *server) GetBlobs(req *gitalypb.GetBlobsRequest, stream gitalypb.BlobService_GetBlobsServer) error {
+       time.Sleep(4 * time.Second)
+
        if err := validateGetBlobsRequest(stream.Context(), s.locator, req); err != nil {
                return structerr.NewInvalidArgument("%w", err)
        }
--- a/lib/gitlab/ci/config.rb
+++ b/lib/gitlab/ci/config.rb
@@ -164,7 +164,7 @@ def expand_config(config, inputs)
         build_config(config, inputs)

       rescue Gitlab::Config::Loader::Yaml::DataTooLargeError, Gitlab::Ci::Config::External::Context::TimeoutError => e
-        track_and_raise_for_dev_exception(e)
+        # track_and_raise_for_dev_exception(e)
         raise Config::ConfigError, e.message

       rescue Gitlab::Ci::Config::Yaml::LoadError => e
  1. Disable gitaly.skip_compile
# gdk.yml

gitaly:
  skip_compile: false
  1. Run make gitaly-setup

  2. Enable the FF

Feature.enable(:ci_config_gitaly_timeout)
  1. Set export GITLAB_CI_CONFIG_GITALY_TIMEOUT_SECONDS=1 in gdk/env.runit

  2. Restart GDK

  3. Have a CI Config with many includes from different sources

include:
  - local: includes/all.yml
  - project: root/basic
    file: includes/all.yml
  - project: my-components/component-1
    file: templates/component-1.yml
  - project: root/component-project-1
    file: .gitlab-ci.yml
  - project: my-components/components-2
    file: templates/component-a.yml
  - project: group-with-policy/policies
    file: pep1.yml
  - project: root/gitlab-clone
    file: .gitlab/ci/global.gitlab-ci.yml
  1. Try to Run pipeline

  1. If you set GITLAB_CI_CONFIG_GITALY_TIMEOUT_SECONDS to 5 or disable the FF or use the master branch

  1. Rollback the changes on the Step 1 and 2 and run make gitaly-setup again.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Furkan Ayhan

Merge request reports

Loading