UX considerations for auto-disabled and rate-limited webhooks
Background
In https://gitlab.com/gitlab-org/gitlab/-/issues/329213 (scenarios 1 + 2) and https://gitlab.com/gitlab-org/gitlab/-/issues/329207 (scenario 3) we've started the groundwork to deal with misbehaving webhooks. These are both behind FFs and not enabled yet on gitlab.com.
Overview
Scenario | Impact | User action required? | |
---|---|---|---|
1 | Webhook fails with "expected" errors (HTTP 4xx) | Webhook is disabled (after 3 failures) | Yes, verify endpoint and reenable webhook |
2 | Webhook fails with "unexpected" errors (HTTP 500, network errors, etc.) | Webhook is retried with exponential backoff (starting at 10m, up to 24h) | No, webhook will keep getting retried and recovers if the endpoint stops misbehaving |
3 | Webhook gets called too frequently | Webhook calls are blocked (for up to 1 minute) | No, webhook recovers after the rate-limit interval has passed |
For 1 and 2, we currently store some information about the failures, and also create an entry in web_hook_logs
(as with all webhook calls, for both successes and failures) which are already exposed in the UI.
For 3, the webhook call gets silently dropped and we log to auth.log
, which is only visible to admins.
Possible UX improvements
Allow resetting failed webhooks
This is definitely needed for 1, and might make sense for 2 as well.
Highlight misbehaving webhooks in the UI
We could mark affected webhooks in the listings shown in the project and group settings, as well as the admin area.
For 1 and 2 we can use the information we store in the DB.
For 3 the rate-limiting state is stored in Redis, but this could be queried as well.
Send notifications
We could consider sending email notifications to relevant administrators/owners, especially for scenario 1 which requires user action.