Disable audit event streaming destinations which are not accessible
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
On referring to kibana logs for AuditEvents:: AuditEventStreamingWorker
at https://log.gprd.gitlab.net/app/r/s/BghRN it can be noticed that nearly 96% errors in the worker are with exception message URL is blocked: Host cannot be resolved or invalid
and nearly all of them are for the same root_namespace.
The above error is happening because of the http streaming destination being not reachable, same can happen in case of other kinds of streaming destinations too where the credentials may get invalidated and are not updated. We will keep trying to send audit events to such destinations and the resources being consumed here are getting wasted. One such instance of problems happening because of this worker can be seen in this S1
issue.
Proposal
We should track count of failed attempts to a certain destination and if it reaches a threshold then we should mark that destination as inactive and also show it on the UI, until the namespace owner or instance admin marks the destination active from UI or by API and we should validate the destination again at that point.
The proposal made here is also followed in industry where the webhooks can be added by customers, one such example if for chatbots.
Similar concern was raised 2 years ago in https://gitlab.com/gitlab-org/gitlab/-/issues/351019#note_823290576 and some good proposals were also made by @huzaifaiftikhar1.
Design
Inactive stream badge and checkbox:
Active stream badge and checkbox: