Skip to content

refactor(alerts): vault audit log failures

Steve Xuereb requested to merge refactor/vault-aduit-failure into master

What

  • Delete the alert VaultAuditLogRequestFailure
  • Create a new SLI for vault_audit_log_request with an error ratio.
  • Create a new SLI for vault_audit_log_response with an error ratio.

Why

The VaultAuditLogRequestFailure was too sensitive because it pages the on-call for a small blip of requests:

All of these pages resulted in non-actionable alerts.

We could tune the existing alert to make it less sensitive, but we already have a pattern established on how to alert on error spikes.

An argument could be made that we shouldn't look at the ratio however, I think in this case it's fine since it's a low request service.

Merge request reports