[FF] `request_cost_headers` -- Roll out per-request cost / namespace response headers
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
This issue is to roll out per-request cost / namespace response headers on production, that is currently behind the request_cost_headers feature flag (type: gitlab_com_derisk).
The flag gates two related code paths introduced in !230708 (merged):
- Reading the
x-gitaly-costgRPC trailer inGitlab::GitalyClient::Calland accumulating per-request cost viaGitlab::RequestCost.current. - Emitting
x-gitlab-score-gitalyandx-gitlab-namespaceresponse headers fromGitlab::Middleware::RequestCost.
The headers feed Cloudflare's complexity-based rate-limiting rule, which buckets per namespace and accumulates the per-response score within a time window.
Owners
- Most appropriate Slack channel to reach out to:
#g_gitaly - Best individual to reach out to: @divya_gitlab
Expectations
What are we expecting to happen?
- Every response from
rails-webshould carryx-gitlab-score-gitaly(when Gitaly cost > 0) andx-gitlab-namespace(when a root namespace is in context). - The score reflects the sum of
x-gitaly-costtrailers set by Gitaly's costhandler middleware for every RPC in the request. cost_score_gitalyappears in theproduction_jsonlog payload for visibility before the Cloudflare rule is enabled.- No user-visible behaviour change; this MR only emits headers.
What can go wrong and how would we detect it?
- Trailer read errors: an exception in
accumulate_costis rescued and logged viaGitlab::AppLogger.warn(message: "Failed to accumulate Gitaly cost", ...). A spike in those log lines would indicate a regression in either Rails or Gitaly's trailer behaviour. - Latency regression: the new path runs on every Gitaly RPC. Watch the Rails Web latency dashboard and the Gitaly RPC latency dashboard before and after each rollout step.
- Cloudflare misbehaviour: the Cloudflare rule is configured separately. This flag only controls Rails emitting the headers; until the Cloudflare rule is created, the headers are observation-only.
Rollout Steps
Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.
This flag is gitlab_com_derisk (GitLab.com-only). The plan is to enable on a small group, observe, then expand globally — and remove the flag once the behaviour is stable.
Rollout on non-production environments
- Verify the MR with the feature flag is merged to
masterand has been deployed to non-production environments with/chatops gitlab run auto_deploy status <merge-commit-of-this-mr> - Enable on non-production:
/chatops gitlab run feature set request_cost_headers true --dev --pre --staging --staging-ref - Verify that the feature works as expected. The best environment to validate the feature in is
staging-canaryas this is the first environment deployed to. Make sure you are configured to use canary. - If the feature flag causes end-to-end tests to fail, disable the feature flag on staging to avoid blocking deployments.
Specific rollout on production
For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.
- Ensure that the feature MRs have been deployed to both production and canary with
/chatops gitlab run auto_deploy status <merge-commit-of-this-mr> - Enable for
gitlab-orgfirst:/chatops gitlab run feature set --group=gitlab-org request_cost_headers true - Verify the headers appear and
cost_score_gitalyis logged for requests against repos in this group.
Preparation before global rollout
- Set a milestone to this rollout issue to signal for enabling and removing the feature flag when it is stable.
- Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does.
- Ensure that the DRI or a representative can be available for at least 2 hours after feature flag updates in production.
- Notify the
#support_gitlab-comSlack channel and#g_gitaly(more guidance when this is necessary in the dev docs).
Global rollout on production
For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.
- Enable globally:
/chatops gitlab run feature set request_cost_headers true. Monitor the appropriate graphs on https://dashboards.gitlab.net for at least 15 minutes. - After the feature has been 100% enabled, wait for at least one day before releasing the feature.
Release the feature
After the feature has been deemed stable, the clean up should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase.
- Create a merge request to remove the
request_cost_headersfeature flag. Ask for review/approval/merge as usual. The MR should include the following changes:- Remove all references to the feature flag from the codebase.
- Remove the YAML definitions for the feature from the repository.
- Ensure that the cleanup MR has been included in the release package.
- Close the feature issue to indicate the feature will be released in the current milestone.
- Once the cleanup MR has been deployed to production, clean up the feature flag from all environments by running these chatops command in
#productionchannel:/chatops gitlab run feature delete request_cost_headers --dev --pre --staging --staging-ref --production - Close this rollout issue.
Rollback Steps
- This feature can be disabled on production by running the following Chatops command:
/chatops gitlab run feature set request_cost_headers false- Disable the feature flag on non-production environments:
/chatops gitlab run feature set request_cost_headers false --dev --pre --staging --staging-ref- Delete feature flag from all environments:
/chatops gitlab run feature delete request_cost_headers --dev --pre --staging --staging-ref --production