Skip to content

Analysis of frequency and duration of specific AI Gateway errors with code 429

Problem

Code Completion requests for the model vertex_ai/codestral-2501 from users of self-managed instances sometimes get errors like (example in Kibana - saved as json):

litellm.RateLimitError: litellm.RateLimitError: VertexAIException - HTTPStatusError - {
  "error": {
    "code": 429,
    "message": "Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429 for more details.",
    "status": "RESOURCE_EXHAUSTED"
  }

Request

How can we determine how frequently these specific errors occur (as opposed to other 429 errors), and their typical duration, beyond the 7 days available in Kibana?

This info will help us decide what action to take in response to these errors (see gitlab-com/gl-infra/production-engineering#26755 (comment 2501586602))