AI clients (Vertex AI, Anthropic, OpenAI etc) should emit SLI metrics post exponential backoff/retry
In !127753 (merged), some metrics were added to measure the success rate of AI client calls.
These metrics measure the rate before exponential backoff/retry is included.
For some AI clients, this is currently as low as 60% success: however this doesn't reflect the user experience. How many user interactions are failing is not recorded in this metric.
For a good SLI, we need to measure the user's experience. This is not reflected in the current metric as several retry attempts may mitigate the problem. Ideally, post retry, this should (ideally) be up to 99% or better.
We should measurements around the retry_with_exponential_backoff
method, so that we can build an SLI reflecting user experience that can be alerted on, notifying teams in the even that we're unable to achieve a target SLO.
It's also important that we measure latency -- using an Apdex given a threshold latency target (for example a target of 20s or less). If we're succeeding but the calls are taking several minutes to return, leading to user frustration, this should be reflected in the apdex SLI. Like the Error Rate SLI, this measurement should also be outside the exponential backoff/retry logic as the user doesn't care how many attempts it took to get a response, only that they got one and within a timely manner.
Once this SLI is in place, is should remove the need for per-feature monitoring, eg #414852 (closed) as this is done across all clients, rather than implemented on each separately. This allows for more consistent monitoring, and lower cognitive load for operators.
What about doing this in the AI Gateway?
I think we should measure in both the AI Gateway and in Rails:
- Having the measurement in Rails will help mitigate problems between Rails and the AI Gateway.
- Self-managed customers don't have access to the AI Gateway metrics
- Measuring in Rails offers the possibility of better attribution of the metrics at that level (for example per
feature_category
)
@mbursi @lmcandrew @jprovaznik @hmerscher @ghavenga @reprazent @nmccorrison