Skip to content

Add apdex measurements per request labelled with feature category and endpoint id

We're currently using the http_request_duration_seconds histogram.

Histograms have a higher cardinality because the cardinality is multiplied by the number of buckets. As a result, all requests need to be graded the same way: we can't add a bucket for every duration requests need while also making the source of the apdex success visible by adding the endpoint_id or feature_category. This is why we're using http_request_duration_seconds as an SLI for the service, and gitlab_transaction_duration_seconds for the stage group's SLI. Both histograms have different buckets and labels. But because the latter has less buckets we can add the the endpoint information.

To unify this, making sure we track the apdex for requests the same way for the service as we do for the stage group's error budget, we could use two counters defined using the methods discussed in #1221 (closed). These counters have and endpoint_id and feature_category label.

  1. gitlab_sli:rails_request_apdex:total: incremented for every request that did not error
  2. gitlab_sli:rails_request_apdex:success_total: incremented for every request that completed with a satisfactory duration, this should not be incremented without first incrementing the counter in 1.

We do need to pay attention when rolling this out, at the time of writing there were slightly under 3k different endpoints (API + Controllers). So let's keep the use of these metrics behind a feature flag. We could set the success target to 1 second in this iteration until we implement #1223 (closed)

Future move to labkit, not a requirement Most likely, this issue will also require the implementation of the result of #1221 (closed) since these would be the first metrics defined like that. I think we should do that inside the GitLab-rails codebase at first, and move it to labkit-ruby in a later iteration (should be a no-op).
Edited by Bob Van Landuyt