[FF] Rollout `ci_cache_project_includes` - cross-request caching for CI project includes
Summary
This issue is to roll out the feature on production, that is currently behind the ci_cache_project_includes feature flag.
Feature flag type: ops
This flag gates Redis-backed (Redis::RepositoryCache) cross-request content caching for include:project CI configuration files. When enabled, file content fetched from Gitaly during pipeline creation is cached using SHA-keyed cache keys (project.id + SHA + path) with a 4-hour TTL. Since Git SHAs are immutable, the cache is inherently safe and requires no invalidation logic.
Note: This flag is intended to remain in the codebase long-term. It will be enabled at 100% on GitLab.com but not rolled out to self-managed, since we cannot predict the Redis capacity of self-managed deployments. This allows quick toggling if Redis memory saturation occurs.
Introduced in: !228106 (merged)
Owners
- Most appropriate Slack channel to reach out to:
#g_pipeline-authoring - Best individual to reach out to: @avielle
Expectations
What are we expecting to happen?
When enabled, CI pipeline creation will cache the content of include:project files in Redis::RepositoryCache with a 4-hour TTL. Subsequent pipeline creations that reference the same file at the same SHA will read from Redis instead of making Gitaly blobs_at calls. This should:
- Significantly reduce Gitaly load from CI config file fetching
- Reduce intermittent
TimeoutErrorfailures for customers with large numbers of includes - Decrease pipeline creation latency for pipelines that share common included files
What can go wrong and how would we detect it?
- Redis memory saturation: Caching file content in Redis increases memory usage. Monitor Redis memory metrics on
Redis::RepositoryCache. If memory usage spikes, disable the flag immediately.- Dashboard: https://dashboards.gitlab.net (Redis RepositoryCache metrics)
- Stale content served: This should not happen because cache keys include the Git SHA (immutable), but if somehow stale content is observed, disabling the flag will bypass the cache entirely.
- Increased error rates on pipeline creation: Monitor Sidekiq and web error rates for
Ci::CreatePipelineService. If error rates increase after enablement, disable the flag.
Rollout Steps
Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.
Rollout on non-production environments
- Enable the feature globally on non-production environments with
/chatops gitlab run feature set ci_cache_project_includes true --dev --pre --staging --staging-ref
Specific rollout on production
-
/chatops gitlab run feature set --project=gitlab-org/gitlab,gitlab-org/gitlab-foss,gitlab-com/www-gitlab-com ci_cache_project_includes true
Global rollout on production
For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.
- Incrementally roll out the feature on production.
- Recommended rollout steps for this flag given the potential Redis memory impact:
/chatops gitlab run feature set ci_cache_project_includes 10 --actors/chatops gitlab run feature set ci_cache_project_includes 25 --actors/chatops gitlab run feature set ci_cache_project_includes 50 --actors/chatops gitlab run feature set ci_cache_project_includes 100 --actors
- Between every step wait for at least 15 minutes and monitor:
- Redis RepositoryCache memory usage on https://dashboards.gitlab.net
- Gitaly request rates (should decrease)
- Pipeline creation error rates
TimeoutErrorrates for CI config fetching
- Recommended rollout steps for this flag given the potential Redis memory impact: