[Feature flag] Apply rate-limiting to webhook executions
Feature
This feature uses the :web_hooks_rate_limit feature flag!
!61151 (merged) introduces the functionality, but is disabled by default (both through the FF, and by not defining a threshold yet).
Owners
- Team: ~"group::ecosystem"
- Most appropriate slack channel to reach out to: #g_create_ecosystem-be
- Best individual to reach out to: @toupeira
- PM: @deuley
Stakeholders
The Rollout Plan
This issue only focuses on the rollout for the Free plan on gitlab.com.
Possible follow-up issues:
- Adding thresholds on paid plans.
- Adding a default threshold for self-managed instances.
Expectations
What are we expecting to happen?
Frequently called webhooks will get rate-limited.
What might happen if this goes wrong?
We might set the threshold too low and break user's workflows.
What can we monitor to detect problems with this?
| Staging | Production | |
|---|---|---|
| Rate limit events (Rails) | https://nonprod-log.gitlab.net/goto/51d8ebf49baf1f84ed7a6f443bfffeb5 | https://log.gprd.gitlab.net/goto/f327f3c32a524be2be2a38e43bf8cffe | 
| Rate limit events (Sidekiq) | https://nonprod-log.gitlab.net/goto/f81cb098d007be1ea735bf702bd1e88d | https://log.gprd.gitlab.net/goto/cd9cdcae88393e22e822cd8f37b4b46d | 
(Note: The log source depends on whether the webhook was triggered from a web request or a job worker)
Rollout Timeline
Rollout Steps
Preparation Phase
- 
Enable on staging ( /chatops run feature set web_hooks_rate_limit true --staging)
- 
Verify behaviour on staging - Set a temporary threshold for the Free plan.
- Verify the rate limiting behaves as expected (rate-limit takes effect, resets after the interval, doesn't affect non-Free plans)
- Reset the temporary threshold.
 
- 
Ensure that documentation has been updated (More info) 
- 
Check that !62130 (merged) is deployed to gitlab.com. 
- 
Enable on production ( /chatops run feature set web_hooks_rate_limit true)- No threshold is defined yet so this won't have an effect, but as a side-effect of checking the plan limits we'll also log the subscription plan in Kibana.
 
- 
Determine a suitable threshold for the Free plan, based on usage patterns in Kibana. 
- 
Submit an MR to: - Add a migration to set the threshold for the Free plan on gitlab.com.
- Document the threshold on https://docs.gitlab.com/ee/user/gitlab_com/index.html#webhooks.
- !62918 (merged)
 
- 
Disable on production ( /chatops run feature set web_hooks_rate_limit false)
Full Rollout Phase
- 
Announce on the issue an estimated time this will be enabled on GitLab.com 
- 
Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does. 
- 
Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the @sre-oncallSlack alias.
- 
Notify about the upcoming change in #support_gitlab-com(more guidance when this is necessary in the dev docs) and in your team channel
- 
After the %14.0 release announcement on June 22: - 
Enable on production ( /chatops run feature set web_hooks_rate_limit true)
- 
Verify the behaviour on production (trigger more than 120 webhook calls per minute and check logs) 
- 
Announce on the issue that the flag has been enabled 
 
- 
- 
Submit an MR to make the feature flag enabled by default. 
- 
Wait for the MR to be deployed. 
- 
Remove the feature flag on all environments. 
Rollback Steps
- 
This feature can be disabled by running the following Chatops command: 
/chatops run feature set web_hooks_rate_limit false