Skip to content

Add a total-requests metric for the image scaler

Matthias Käppler requested to merge mk/image-scaler-total-requests-metric into master

As mentioned in gitlab-com/runbooks#52 (comment 435059023) it is currently difficult to understand the failure rate of the image scaler, since it cannot be easily gleaned from our HTTP request metrics.

We therefore decided that it would be best to have a metric specific to the scaler component that tracks total request counts, broken down by a status label. The metric is gitlab_workhorse_image_resize_requests_total. The statuses I settled for are:

statusSuccess        = "success"        // a rescaled image was served
statusScalingFailure = "scaling-failed" // scaling failed but the original image was served
statusRequestFailure = "request-failed" // no image was served
statusUnknown        = "unknown"        // indicates an unhandled status case

This will allow us to obtain:

  1. The total number of requests: gitlab_workhorse_image_resize_requests_total
  2. The number of successfully rescaled images that were actually served: gitlab_workhorse_image_resize_requests_total{status="success"}
  3. The number of events where we failed over to the original size, thus resulting in potentially degraded client performance: gitlab_workhorse_image_resize_requests_total{status="scaling-failed"}
  4. The number of events that resulted in a broken user experience (think HTTP 500): gitlab_workhorse_image_resize_requests_total{status="request-failed"}

Soft- and hard error rates can then be computed by dividing a metric with a failure status over the total count.

Edited by Matthias Käppler

Merge request reports