Add AI SLIs to metrics catalog and add alterting for them
Related to gitlab-org/gitlab#421546 (comment 1536745776)
In gitlab-org/gitlab!129367 (merged) and gitlab-org/gitlab!130395 (merged) we added SLIs to monitor duration and error rate of AI requests. It would be great to add these to the metrics catalog and add alerting on them.
The application SLI that was added is called llm_completion
. It measures the entire interaction with an AI provider for a specific feature. It is labelled with feature_category
and service_class
to signify which feature we're talking about. The metrics are emitted from Sidekiq.
We should add the application SLI to the library so we can see the results in a global aggregate and determine an appropriate SLO.
We should also add this SLI to a service so we get alerting for it. Ideally, this would be a client-side SLI for the AI-gateway. But right now, the request aren't routed through the AI gateway yet, and I don't think we should wait for that. Should we add the SLI to Sidekiq for now?