[FF] `search_finders_redis_cache` -- Rollout
Summary
This issue is to roll out the feature on production,
that is currently behind the search_finders_redis_cache feature flag.
Introduced by !224795 (closed) and finished in !231935 (merged). The flag controls a short-lived (5 min) Redis cache in Search::GroupsFinder and Search::ProjectsFinder, with invalidation driven by Search::ExpireFinderCacheWorker through a per-user cache-version bump.
Related: #594620 (investigate cache-key expiration and TTL tuning before rollout).
Owners
- Most appropriate Slack channel to reach out to:
#g_global_search - Best individual to reach out to: @terrichu
Expectations
What are we expecting to happen?
When the flag is enabled, calls to Search::GroupsFinder and Search::ProjectsFinder fetch authorized groups / projects from Redis when a fresh-enough entry exists, falling back to the PG query on a miss. Cache entries are per-user, 5-minute TTL. Any membership, authorization, link, or role event listed in SearchSubscriptions#register_finder_cache_events bumps the per-user version via Search::ExpireFinderCacheWorker, so the next read rebuilds the cache.
Expected net effect: significantly lower PG load on every request path that queries the user's authorized groups / projects during search and during Knowledge Graph authorization (Orbit API, MCP).
What can go wrong and how would we detect it?
- Stale results after membership change — if the event-store invalidation path is missed for a given change path, the user sees stale authorized groups/projects for up to 5 minutes. Detect via reports of "I just got added/removed but search still behaves like before". Mitigate by rolling back the flag.
- Redis memory growth — per-user keys at 5-min TTL scale with
O(#active_users * #access_levels * #feature_variants). Monitor the search cache keyspace inGitlab::Redis::Cache. - Cache miss thundering herd — many users with an expired cache hitting the finder at once. Redis and the DB should both absorb this, but watch PG
search_user:*query rates during incremental rollout.
Most relevant dashboards: search-api, redis-cache, and the PG-query dashboards for Search::GroupsFinder / Search::ProjectsFinder.
Rollout Steps
Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.
Rollout on non-production environments
- Verify the MR with the feature flag is merged to
masterand has been deployed to non-production environments with/chatops gitlab run auto_deploy status <merge-commit-of-your-feature> - Deploy the feature flag at 50% on non-production:
/chatops gitlab run feature set search_finders_redis_cache 50 --actors --dev --pre --staging --staging-ref - Monitor that error rates and search latency did not regress.
- Enable the feature globally on non-production:
/chatops gitlab run feature set search_finders_redis_cache true --dev --pre --staging --staging-ref - Verify on staging-canary that search returns the expected authorized groups/projects.
- Run the Orbit E2E validation (KG query path exercises the same finder cache).
Rollout on production
- Enable on 1% of actors:
/chatops gitlab run feature set search_finders_redis_cache 1 --actors - Monitor Redis memory and PG query rates for 24 hours.
- Scale to 10% actors, then 25%, then 50%, then 100%, watching between each step.
- Enable globally:
/chatops gitlab run feature set search_finders_redis_cache true
Rollout on GitLab.com
Same as production above.
Release the feature with the feature flag
- After at least one full release on default_enabled: true, remove the flag and delete the feature flag definition and the
SearchSubscriptions#register_finder_cache_eventsif:guards.
Rollback steps
If issues are observed:
- Disable globally:
/chatops gitlab run feature set search_finders_redis_cache false