[FF] `request_cost_headers` -- Roll out per-request cost / namespace response headers

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

This issue is to roll out per-request cost / namespace response headers on production, that is currently behind the request_cost_headers feature flag (type: gitlab_com_derisk).

The flag gates two related code paths introduced in !230708 (merged):

  1. Reading the x-gitaly-cost gRPC trailer in Gitlab::GitalyClient::Call and accumulating per-request cost via Gitlab::RequestCost.current.
  2. Emitting x-gitlab-score-gitaly and x-gitlab-namespace response headers from Gitlab::Middleware::RequestCost.

The headers feed Cloudflare's complexity-based rate-limiting rule, which buckets per namespace and accumulates the per-response score within a time window.

Owners

  • Most appropriate Slack channel to reach out to: #g_gitaly
  • Best individual to reach out to: @divya_gitlab

Expectations

What are we expecting to happen?

  • Every response from rails-web should carry x-gitlab-score-gitaly (when Gitaly cost > 0) and x-gitlab-namespace (when a root namespace is in context).
  • The score reflects the sum of x-gitaly-cost trailers set by Gitaly's costhandler middleware for every RPC in the request.
  • cost_score_gitaly appears in the production_json log payload for visibility before the Cloudflare rule is enabled.
  • No user-visible behaviour change; this MR only emits headers.

What can go wrong and how would we detect it?

  • Trailer read errors: an exception in accumulate_cost is rescued and logged via Gitlab::AppLogger.warn(message: "Failed to accumulate Gitaly cost", ...). A spike in those log lines would indicate a regression in either Rails or Gitaly's trailer behaviour.
  • Latency regression: the new path runs on every Gitaly RPC. Watch the Rails Web latency dashboard and the Gitaly RPC latency dashboard before and after each rollout step.
  • Cloudflare misbehaviour: the Cloudflare rule is configured separately. This flag only controls Rails emitting the headers; until the Cloudflare rule is created, the headers are observation-only.

Rollout Steps

Note: Please make sure to run the chatops commands in the Slack channel that gets impacted by the command.

This flag is gitlab_com_derisk (GitLab.com-only). The plan is to enable on a small group, observe, then expand globally — and remove the flag once the behaviour is stable.

Rollout on non-production environments

  • Verify the MR with the feature flag is merged to master and has been deployed to non-production environments with /chatops gitlab run auto_deploy status <merge-commit-of-this-mr>
  • Enable on non-production: /chatops gitlab run feature set request_cost_headers true --dev --pre --staging --staging-ref
  • Verify that the feature works as expected. The best environment to validate the feature in is staging-canary as this is the first environment deployed to. Make sure you are configured to use canary.
  • If the feature flag causes end-to-end tests to fail, disable the feature flag on staging to avoid blocking deployments.

Specific rollout on production

For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.

  • Ensure that the feature MRs have been deployed to both production and canary with /chatops gitlab run auto_deploy status <merge-commit-of-this-mr>
  • Enable for gitlab-org first: /chatops gitlab run feature set --group=gitlab-org request_cost_headers true
  • Verify the headers appear and cost_score_gitaly is logged for requests against repos in this group.

Preparation before global rollout

Global rollout on production

For visibility, all /chatops commands that target production must be executed in the #production Slack channel and cross-posted (with the command results) to the responsible team's Slack channel.

Release the feature

After the feature has been deemed stable, the clean up should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase.

  • Create a merge request to remove the request_cost_headers feature flag. Ask for review/approval/merge as usual. The MR should include the following changes:
    • Remove all references to the feature flag from the codebase.
    • Remove the YAML definitions for the feature from the repository.
  • Ensure that the cleanup MR has been included in the release package.
  • Close the feature issue to indicate the feature will be released in the current milestone.
  • Once the cleanup MR has been deployed to production, clean up the feature flag from all environments by running these chatops command in #production channel: /chatops gitlab run feature delete request_cost_headers --dev --pre --staging --staging-ref --production
  • Close this rollout issue.

Rollback Steps

  • This feature can be disabled on production by running the following Chatops command:
/chatops gitlab run feature set request_cost_headers false
  • Disable the feature flag on non-production environments:
/chatops gitlab run feature set request_cost_headers false --dev --pre --staging --staging-ref
  • Delete feature flag from all environments:
/chatops gitlab run feature delete request_cost_headers --dev --pre --staging --staging-ref --production
Edited by Divya Rani