Skip to content

[Feature flag] Apply rate-limiting to webhook executions

Feature

This feature uses the :web_hooks_rate_limit feature flag!

!61151 (merged) introduces the functionality, but is disabled by default (both through the FF, and by not defining a threshold yet).

Owners

  • Team: ~"group::ecosystem"
  • Most appropriate slack channel to reach out to: #g_create_ecosystem-be
  • Best individual to reach out to: @toupeira
  • PM: @deuley

Stakeholders

The Rollout Plan

This issue only focuses on the rollout for the Free plan on gitlab.com.

Possible follow-up issues:

  • Adding thresholds on paid plans.
  • Adding a default threshold for self-managed instances.

Expectations

What are we expecting to happen?

Frequently called webhooks will get rate-limited.

What might happen if this goes wrong?

We might set the threshold too low and break user's workflows.

What can we monitor to detect problems with this?

Staging Production
Rate limit events (Rails) https://nonprod-log.gitlab.net/goto/51d8ebf49baf1f84ed7a6f443bfffeb5 https://log.gprd.gitlab.net/goto/f327f3c32a524be2be2a38e43bf8cffe
Rate limit events (Sidekiq) https://nonprod-log.gitlab.net/goto/f81cb098d007be1ea735bf702bd1e88d https://log.gprd.gitlab.net/goto/cd9cdcae88393e22e822cd8f37b4b46d

(Note: The log source depends on whether the webhook was triggered from a web request or a job worker)

Rollout Timeline

Rollout Steps

Preparation Phase

  • Enable on staging (/chatops run feature set web_hooks_rate_limit true --staging)

  • Verify behaviour on staging

    • Set a temporary threshold for the Free plan.
    • Verify the rate limiting behaves as expected (rate-limit takes effect, resets after the interval, doesn't affect non-Free plans)
    • Reset the temporary threshold.
  • Ensure that documentation has been updated (More info)

  • Check that !62130 (merged) is deployed to gitlab.com.

  • Enable on production (/chatops run feature set web_hooks_rate_limit true)

    • No threshold is defined yet so this won't have an effect, but as a side-effect of checking the plan limits we'll also log the subscription plan in Kibana.
  • Determine a suitable threshold for the Free plan, based on usage patterns in Kibana.

  • Submit an MR to:

  • Disable on production (/chatops run feature set web_hooks_rate_limit false)

Full Rollout Phase

  • Announce on the issue an estimated time this will be enabled on GitLab.com

  • Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does.

  • Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the @sre-oncall Slack alias.

  • Notify about the upcoming change in #support_gitlab-com (more guidance when this is necessary in the dev docs) and in your team channel

  • After the %14.0 release announcement on June 22:

    • Enable on production (/chatops run feature set web_hooks_rate_limit true)
    • Verify the behaviour on production (trigger more than 120 webhook calls per minute and check logs)
    • Announce on the issue that the flag has been enabled
  • Submit an MR to make the feature flag enabled by default.

  • Wait for the MR to be deployed.

  • Remove the feature flag on all environments.

Rollback Steps

  • This feature can be disabled by running the following Chatops command:
/chatops run feature set web_hooks_rate_limit false
Edited by Markus Koller