Provide estimated wait times for instance runners

Release notes

If a CI/CD job won't start, how do you know when it will, or if the root cause is because of busy runners? With this feature in the Admin Area, you'll be able to know what the estimated wait time is for all instance runners, which will provide admins with more insight into how busy their runners are.

Problem to solve

JTBD

When I am troubleshooting CI jobs, I want to quickly know if the problem connects with the job execution agent, so I can resolve the problem and continue working.

User story

As a Platform Engineer who is checking on CI jobs' performance for an instance, I need to see how busy the runner is so that I can quickly determine if there are performance issues with that runner and the underlying host system or platform.

Further details:

Sometimes developers will report that their job is pending and they ask the admin if something is wrong with the runner. The admin has no insight into how many jobs the runner has in its queue, so they wouldn't have any answer. Also, from an infrastructure perspective, the admin may see the same runner continuously in "pending" states, but they don't have data around that, so they'd have to guess if they should create another runner to pick up those types of jobs.

Intended users

Priyanka (Platform Engineer)

User experience goal

The user should be able to quickly see how long their instance runners are taking to pick up jobs.

Proposal

Add a performance insights button.
Add usage ping tracking to the button.
The button should bring up a modal that provides an average wait time for how long it takes for all instance runners to pick up a job.
Designs in design management
Figma file

Technical details

We know the waiting time for a job, and we know the runner who ran that job, along with the tags used. We will calculate the average wait time of the last X jobs run by that runner with X tags to calculate the "estimated job wait time".

Available Tier

Ultimate

Tier rationale: At GitLab, we make feature tiering decisions based on the likely buyer persona and thinking about who cares most about the feature. Specifically for Runner Fleet, it has primarily been users at organizations with hundreds to thousands of Runners that have provided us feedback on the problems of managing Runners at scale. Typically these organizations are Ultimate plan customers. This has been the primary reason for including Fleet management features in Ultimate. As we continue to get more feedback from users and customers on these features, we will evaluate if to move specific Fleet management features to Premium.

Feature Usage Metrics

Track page views of Admin Area > Runners via usage ping.

backend Implementation proposal

Create a jobsMetrics GraphQL object under CiRunnerConnection (database query plan):

runners(type: INSTANCE_TYPE) {
  jobsStatistics {
    queuedDuration {
      p50
    }
  }
}

Limitations: - The API will be marked as alpha since it is not yet clear if this will match up with future iterations of the functionality (it is the most flexible I would think of though); - The query will be limited to computing the values from a pool of the latest 100 builds assigned to the runners in question. This is necessary to reduce traffic on the ci_builds table; - Only available to admins (will limit impact on .com)
We might end up needing to replace the index_ci_builds_on_runner_id_and_id_desc to include the status column, so as to remove jobs that have not actually started from the calculation.

Links / references

This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

Edited Jan 26, 2023 by Darren Eastman