Add apdex measurements per request labelled with feature category and endpoint id
We're currently using the http_request_duration_seconds
histogram.
Histograms have a higher cardinality because the cardinality is multiplied by the number of buckets. As a result, all requests need to be graded the same way: we can't add a bucket for every duration requests need while also making the source of the apdex success visible by adding the endpoint_id
or feature_category
. This is why we're using http_request_duration_seconds
as an SLI for the service, and gitlab_transaction_duration_seconds
for the stage group's SLI. Both histograms have different buckets and labels. But because the latter has less buckets we can add the the endpoint information.
To unify this, making sure we track the apdex for requests the same way for the service as we do for the stage group's error budget, we could use two counters defined using the methods discussed in #1221 (closed). These counters have and endpoint_id
and feature_category
label.
-
gitlab_sli:rails_request_apdex:total
: incremented for every request that did not error -
gitlab_sli:rails_request_apdex:success_total
: incremented for every request that completed with a satisfactory duration, this should not be incremented without first incrementing the counter in 1.
We do need to pay attention when rolling this out, at the time of writing there were slightly under 3k different endpoints (API + Controllers). So let's keep the use of these metrics behind a feature flag. We could set the success
target to 1 second in this iteration until we implement #1223 (closed)