Update pending jobs queue size in runner dashbaords
Update pending jobs queue size in runner dashbaords
Until now ci-runners dashboards were using the gitlab_ci_queue_size_total_bucket histogram to get the estimate of jobs in the queue. While that works, it's not the best metric. With histogram we work with strictly defined bucket, so the printed number is not only an estimate (which with the way how we count that size on GitLab side is already not ideal), but it's also very much interpolated basing on the histogram definition.
Fortunately, we also have the gitlab_ci_current_queue_size gauge, which prints the current state of the size. That removes the incorrect interpolation when showing the countings.
That leaves us only with the estimation inaccuracy created by GitLab's counting.
To give some understanding about this - GitLab doesn't store queues for different runners nor even runner types separately. All jobs are queued in a single database table and the queue is built by querying this table.
We query for jobs targetting specific runner type only when a runner asks for it. At this moment we know how many jobs are targetting this specific runner type, but a job on that list may as well target other runners of different types. That's the main cause of the inaccuracy.
More, if all runners of a specific kind would be paused (for example all
instance runners), the measurement for the instance_type
would drop to
zero, because no instance_type
runner would trigger the counting
mechanism. While the queue would definitely grow at that time.
But with all these limitations, this metric is still a good enough estimate of what's the state of the queue and can bring useful information when working with incidents. Especially to see if actions that were taken are making any positive difference.
With that in mind - switching from the histogram to the gauge metric will even improve that experience.
Dashboard previews:
URL | Dashboard name |
---|---|
https://dashboards.gitlab.net/dashboard/snapshot/0eqfZZBi9dZWQZukUNFIfOU1pveg2xHs | ci-runners: Deployment overview |
https://dashboards.gitlab.net/dashboard/snapshot/7XqFh9sZHHGFdK6DSfA3AnolXLs6tb2t | ci-runners: Incident Support: autoscaling |
https://dashboards.gitlab.net/dashboard/snapshot/H7urizaCccgq60jCzEEPuUK5J39iE7Tw | ci-runners: Incident Support: autoscaling-new |
https://dashboards.gitlab.net/dashboard/snapshot/oGUW36DaS4JoW0jwpEzpX1oqfcQHqgKv | ci-runners: Incident Support: database |
https://dashboards.gitlab.net/dashboard/snapshot/8UNFRXp4f2yJCCUTPxFSrSUOQQ8wClFi | ci-runners: Incident Support: gitlab-application |
https://dashboards.gitlab.net/dashboard/snapshot/IE3iZR6phex4GoRDQUiBo67bXV7BkdxA | ci-runners: Incident Support: runner-manager |