Add an inference error rate SLI
We currently only track inference apdex in our metrics catalog for alerting and health checks. We should include error rates for that. This way, when a single model becomes unavailable, we'll be alerted even if the request error ratio does not exceed thresholds.
There's a start of this implementation in !699 (closed), but it needs refactoring.