Skip to content

Update pending jobs queue size in runner dashbaords

Update pending jobs queue size in runner dashbaords

Until now ci-runners dashboards were using the gitlab_ci_queue_size_total_bucket histogram to get the estimate of jobs in the queue. While that works, it's not the best metric. With histogram we work with strictly defined bucket, so the printed number is not only an estimate (which with the way how we count that size on GitLab side is already not ideal), but it's also very much interpolated basing on the histogram definition.

Fortunately, we also have the gitlab_ci_current_queue_size gauge, which prints the current state of the size. That removes the incorrect interpolation when showing the countings.

That leaves us only with the estimation inaccuracy created by GitLab's counting.

To give some understanding about this - GitLab doesn't store queues for different runners nor even runner types separately. All jobs are queued in a single database table and the queue is built by querying this table.

We query for jobs targetting specific runner type only when a runner asks for it. At this moment we know how many jobs are targetting this specific runner type, but a job on that list may as well target other runners of different types. That's the main cause of the inaccuracy.

More, if all runners of a specific kind would be paused (for example all instance runners), the measurement for the instance_type would drop to zero, because no instance_type runner would trigger the counting mechanism. While the queue would definitely grow at that time.

But with all these limitations, this metric is still a good enough estimate of what's the state of the queue and can bring useful information when working with incidents. Especially to see if actions that were taken are making any positive difference.

With that in mind - switching from the histogram to the gauge metric will even improve that experience.

Dashboard previews:

URL Dashboard name
https://dashboards.gitlab.net/dashboard/snapshot/0eqfZZBi9dZWQZukUNFIfOU1pveg2xHs ci-runners: Deployment overview
https://dashboards.gitlab.net/dashboard/snapshot/7XqFh9sZHHGFdK6DSfA3AnolXLs6tb2t ci-runners: Incident Support: autoscaling
https://dashboards.gitlab.net/dashboard/snapshot/H7urizaCccgq60jCzEEPuUK5J39iE7Tw ci-runners: Incident Support: autoscaling-new
https://dashboards.gitlab.net/dashboard/snapshot/oGUW36DaS4JoW0jwpEzpX1oqfcQHqgKv ci-runners: Incident Support: database
https://dashboards.gitlab.net/dashboard/snapshot/8UNFRXp4f2yJCCUTPxFSrSUOQQ8wClFi ci-runners: Incident Support: gitlab-application
https://dashboards.gitlab.net/dashboard/snapshot/IE3iZR6phex4GoRDQUiBo67bXV7BkdxA ci-runners: Incident Support: runner-manager
Edited by Tomasz Maczukin

Merge request reports