Draft: Implement watcher for Vertex API rate limit error occurrences (!6076) · Merge requests · GitLab.com / Runbooks

Gregory Havenga requested to merge ghavenga-implement-rate-limit-error-watcher into master Jul 28, 2023

Implement a watcher in ElasticSearch that watches for incidents of the exponential back-off RateLimitError resulting from too many failed attempts to contact an external AI API. (In particular, VertexAI).

It is possible that we may want to expand the scope of this and send alerts to the AI framework team as an unresponsive Vertex API may be a problem for more than just Threat Insights.

Ideally this would be threshold tied to ongoing requests to avoid spamming the channel with false positives. X% of at least Y results in the past period would be the preferable approach.

Draft: Implement watcher for Vertex API rate limit error occurrences

Merge request reports