Add per-request Gitaly timeout for CI config file fetching
What does this MR do and why?
To reduce the blast radius during infrastructure incidents, the proposal is to introduce static per-request timeouts for individual external calls during CI config fetching. This won't prevent pipeline failures during incidents, but it would make them fail fast instead of holding Sidekiq workers for up to 30 seconds, reduce resource consumption, give users quicker feedback to retry, and make failures easier to correlate with infrastructure metrics.
Introduce a configurable timeout
(GITLAB_CI_CONFIG_GITALY_TIMEOUT_SECONDS)
for Gitaly calls when fetching CI configuration files. This addresses
pipeline creation hanging when Gitaly experiences slowness.
The timeout applies to local, project, and component includes via a
thread-local timeout wrapper that propagates to Gitaly client calls.
Controlled by the ci_config_gitaly_timeout feature flag (#590947).
References
Related to #588313
Screenshots or screen recordings
- Make changes to simulate a slowness
--- a/internal/gitaly/service/blob/get_blobs.go
+++ b/internal/gitaly/service/blob/get_blobs.go
@@ -5,6 +5,7 @@ import (
"context"
"errors"
"io"
+ "time"
"gitlab.com/gitlab-org/gitaly/v18/internal/git"
"gitlab.com/gitlab-org/gitaly/v18/internal/git/catfile"
@@ -155,6 +156,8 @@ func sendBlobTreeEntry(
}
func (s *server) GetBlobs(req *gitalypb.GetBlobsRequest, stream gitalypb.BlobService_GetBlobsServer) error {
+ time.Sleep(4 * time.Second)
+
if err := validateGetBlobsRequest(stream.Context(), s.locator, req); err != nil {
return structerr.NewInvalidArgument("%w", err)
}
--- a/lib/gitlab/ci/config.rb
+++ b/lib/gitlab/ci/config.rb
@@ -164,7 +164,7 @@ def expand_config(config, inputs)
build_config(config, inputs)
rescue Gitlab::Config::Loader::Yaml::DataTooLargeError, Gitlab::Ci::Config::External::Context::TimeoutError => e
- track_and_raise_for_dev_exception(e)
+ # track_and_raise_for_dev_exception(e)
raise Config::ConfigError, e.message
rescue Gitlab::Ci::Config::Yaml::LoadError => e
- Disable
gitaly.skip_compile
# gdk.yml
gitaly:
skip_compile: false
-
Run
make gitaly-setup -
Enable the FF
Feature.enable(:ci_config_gitaly_timeout)
-
Set
export GITLAB_CI_CONFIG_GITALY_TIMEOUT_SECONDS=1ingdk/env.runit -
Restart GDK
-
Have a CI Config with many includes from different sources
include:
- local: includes/all.yml
- project: root/basic
file: includes/all.yml
- project: my-components/component-1
file: templates/component-1.yml
- project: root/component-project-1
file: .gitlab-ci.yml
- project: my-components/components-2
file: templates/component-a.yml
- project: group-with-policy/policies
file: pep1.yml
- project: root/gitlab-clone
file: .gitlab/ci/global.gitlab-ci.yml
- Try to Run pipeline
- If you set
GITLAB_CI_CONFIG_GITALY_TIMEOUT_SECONDSto5or disable the FF or use the master branch
- Rollback the changes on the Step 1 and 2 and run
make gitaly-setupagain.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.