Add recording rules for Service Level Indicators (SLIs)
Summary
Add a set of recording rules for Service Level Indicators (SLIs)
Proposal
In order for users and support to be able to quickly determine the health of a GitLab instance, we need a set of recording rules that can be used as Service Level Indicators.
-
Service availability - What is the basic availability of the components. -
Rails capacity - How busy are the rails worker processes threads. -
Error rates/ratios - How many errors are being produced compared to how many requests there are. -
Apdex - How fast are components responding.
Latency is data comes from a variety of locations. Each component can expose latency information. It's also important to normalize things. Larger requests are expected to take more time, so having metrics for the relative work per request is necessary.
References
(Provide references related to this proposal)