Provide estimated wait times for instance runners
Release notes
If a CI/CD job won't start, how do you know when it will, or if the root cause is because of busy runners? With this feature in the Admin Area, you'll be able to know what the estimated wait time is for all instance runners, which will provide admins with more insight into how busy their runners are.
Problem to solve
JTBD
When I am troubleshooting CI jobs, I want to quickly know if the problem connects with the job execution agent, so I can resolve the problem and continue working.
User story
As a Platform Engineer who is checking on CI jobs' performance for an instance, I need to see how busy the runner is so that I can quickly determine if there are performance issues with that runner and the underlying host system or platform.
Further details:
Sometimes developers will report that their job is pending and they ask the admin if something is wrong with the runner. The admin has no insight into how many jobs the runner has in its queue, so they wouldn't have any answer. Also, from an infrastructure perspective, the admin may see the same runner continuously in "pending" states, but they don't have data around that, so they'd have to guess if they should create another runner to pick up those types of jobs.
Intended users
User experience goal
The user should be able to quickly see how long their instance runners are taking to pick up jobs.
Proposal
- Add a
performance insights
button. - Add usage ping tracking to the button.
- The button should bring up a modal that provides an average wait time for how long it takes for all instance runners to pick up a job.
- Designs in design management
- Figma file
Technical details
We know the waiting time for a job, and we know the runner who ran that job, along with the tags used. We will calculate the average wait time of the last X jobs run by that runner with X
tags to calculate the "estimated job wait time".
Available Tier
- Ultimate
- Tier rationale: At GitLab, we make feature tiering decisions based on the likely buyer persona and thinking about who cares most about the feature. Specifically for Runner Fleet, it has primarily been users at organizations with hundreds to thousands of Runners that have provided us feedback on the problems of managing Runners at scale. Typically these organizations are Ultimate plan customers. This has been the primary reason for including Fleet management features in Ultimate. As we continue to get more feedback from users and customers on these features, we will evaluate if to move specific Fleet management features to Premium.
Feature Usage Metrics
Track page views of Admin Area > Runners via usage ping.
backend Implementation proposal
-
Create a
jobsMetrics
GraphQL object underCiRunnerConnection
(database query plan):runners(type: INSTANCE_TYPE) { jobsStatistics { queuedDuration { p50 } } }
-
Limitations: - The API will be marked as
alpha
since it is not yet clear if this will match up with future iterations of the functionality (it is the most flexible I would think of though); - The query will be limited to computing the values from a pool of the latest 100 builds assigned to the runners in question. This is necessary to reduce traffic on theci_builds
table; - Only available to admins (will limit impact on .com) -
We might end up needing to replace the
index_ci_builds_on_runner_id_and_id_desc
to include thestatus
column, so as to remove jobs that have not actually started from the calculation.
Links / references
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.