[FF] Rollout `ci_cache_project_includes` - cross-request caching for CI project includes

Summary

This issue is to roll out the feature on production, that is currently behind the ci_cache_project_includes feature flag.

Feature flag type: ops

This flag gates Redis-backed (Redis::RepositoryCache) cross-request content caching for include:project CI configuration files. When enabled, file content fetched from Gitaly during pipeline creation is cached using SHA-keyed cache keys (project.id + SHA + path) with a 4-hour TTL. Since Git SHAs are immutable, the cache is inherently safe and requires no invalidation logic.

Note: This flag is intended to remain in the codebase long-term. It will be enabled at 100% on GitLab.com but not rolled out to self-managed, since we cannot predict the Redis capacity of self-managed deployments. This allows quick toggling if Redis memory saturation occurs.

Introduced in: !228106 (merged)

Owners

  • Most appropriate Slack channel to reach out to: #g_pipeline-authoring
  • Best individual to reach out to: @avielle

Expectations

What are we expecting to happen?

When enabled, CI pipeline creation will cache the content of include:project files in Redis::RepositoryCache with a 4-hour TTL. Subsequent pipeline creations that reference the same file at the same SHA will read from Redis instead of making Gitaly blobs_at calls. This should:

  • Significantly reduce Gitaly load from CI config file fetching
  • Reduce intermittent TimeoutError failures for customers with large numbers of includes
  • Decrease pipeline creation latency for pipelines that share common included files

What can go wrong and how would we detect it?

  1. Redis memory saturation: Caching file content in Redis increases memory usage. Monitor Redis memory metrics on Redis::RepositoryCache. If memory usage spikes, disable the flag immediately.
  2. Stale content served: This should not happen because cache keys include the Git SHA (immutable), but if somehow stale content is observed, disabling the flag will bypass the cache entirely.
  3. Increased error rates on pipeline creation: Monitor Sidekiq and web error rates for Ci::CreatePipelineService. If error rates increase after enablement, disable the flag.

Rollout Steps

Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.

Rollout on non-production environments

  • Enable the feature globally on non-production environments with /chatops gitlab run feature set ci_cache_project_includes true --dev --pre --staging --staging-ref

Specific rollout on production

  • /chatops gitlab run feature set --project=gitlab-org/gitlab,gitlab-org/gitlab-foss,gitlab-com/www-gitlab-com ci_cache_project_includes true

Global rollout on production

For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.

  • Incrementally roll out the feature on production.
    • Recommended rollout steps for this flag given the potential Redis memory impact:
      • /chatops gitlab run feature set ci_cache_project_includes 10 --actors
      • /chatops gitlab run feature set ci_cache_project_includes 25 --actors
      • /chatops gitlab run feature set ci_cache_project_includes 50 --actors
      • /chatops gitlab run feature set ci_cache_project_includes 100 --actors
    • Between every step wait for at least 15 minutes and monitor:
      • Redis RepositoryCache memory usage on https://dashboards.gitlab.net
      • Gitaly request rates (should decrease)
      • Pipeline creation error rates
      • TimeoutError rates for CI config fetching
Edited by 🤖 GitLab Bot 🤖