Analysis of frequency and duration of specific AI Gateway errors with code 429
Problem
Code Completion requests for the model vertex_ai/codestral-2501
from users of self-managed instances sometimes get errors like (example in Kibana - saved as json):
litellm.RateLimitError: litellm.RateLimitError: VertexAIException - HTTPStatusError - {
"error": {
"code": 429,
"message": "Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429 for more details.",
"status": "RESOURCE_EXHAUSTED"
}
Request
How can we determine how frequently these specific errors occur (as opposed to other 429 errors), and their typical duration, beyond the 7 days available in Kibana?
This info will help us decide what action to take in response to these errors (see gitlab-com/gl-infra/production-engineering#26755 (comment 2501586602))