Expose queue duration related metrics in job payload sent to the runner
What does this MR do and why?
This MR adds a queued_for
and project_jobs_running_on_instance_runners_count
fields sent in the job_info
section of job payload sent to GitLab Runner.
queued_for
, representing the difference between time of response generation and time of scheduling the job for queueing (which is set with the queued_at
field in the database record), Runner will know how long the job was being in the queue.
This value will be next used on the GitLab Runner side to generate a histogram metric, representing queueing times for each specific runner and [[runners]]
worker that asks for jobs. This will allow runner administrators to track queueing times of jobs targeting their runners.
Similarly project_jobs_running_on_instance_runners
, for jobs targeting instance runners (and only instance type runners!) will allow the runner administrators to see how scheduled jobs fit into the fair scheduling algorithm.
We will use such information, for example, to improve our SLI definitions for SaaS runners on GitLab.com. As currently our apdex is the same for all different runner types that we manage, as it's calculated from a general metric exposed from GitLab, which by design doesn't partition such information per runner.
With this change and the planned GitLab Runner change, each runner owner will be able to track such data with other runner metrics. And for us this means that we will be able to define different SLIs for each SaaS runner shard that we maintain.
The biggest value of this change is however in the fact, that this metric would become usable for self-hosted GitLab Runner instances. As for GitLab installations like GitLab.com, individual users who self-host their runners and would like to track queuing performance are unable to do that, as GitLab internal metrics are... well... internal job_queue_duration_seconds
exposed by GitLab can't be partitioned by the individual Runner ID, as such cardinality of data would quickly kill any Prometheus server.
By passing this data to Runner and exposing it there, each Runner owner can track the queuing timing of their own runner instances. Without a need for GitLab administrators to expose GitLab's metric and with the data being partitioned by each individual runner.
GitLab Runner update at gitlab-runner!3499 (merged).
Screenshots or screen recordings
N/A
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Merge request reports
Activity
added [Deprecated] Category:Runner devopsverify grouprunner sectionops labels
assigned to @tmaczukin
added featureenhancement typefeature + 1 deleted label
Suggested Reviewers (beta)
The individuals below may be good candidates to participate in the review based on various factors.
You can use slash commands in comments to quickly assign
/assign_reviewer @user1
.Suggested Reviewers @rspeicher
,@tkuah
,@marin
,@marcel.amirault
,@engwan
If you do not believe these suggestions are useful, please apply the label Bad Suggested Reviewer. You can also provide feedback for this feature on this issue:
https://gitlab.com/gitlab-org/gitlab/-/issues/357923
.Automatically generated by Suggested Reviewers Bot - an experimental ML-based recommendation engine created by ~"group::applied ml".
Edited by GitLab Reviewer-Recommender Bot3 Warnings 5f97f00c: The commit body should not contain more than 72 characters per line. For more information, take a look at our Commit message guidelines. featureaddition and featureenhancement merge requests normally have a documentation change. Consider adding a documentation update or confirming the documentation plan with the Technical Writer counterpart.
For more information, see:
- The Handbook page on merge request types.
- The definition of done documentation.
This merge request does not refer to an existing milestone. 1 Message CHANGELOG missing: If you want to create a changelog entry for GitLab FOSS, add the
Changelog
trailer to the commit message you want to add to the changelog.If you want to create a changelog entry for GitLab EE, also add the
EE: true
trailer to your commit message.If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Valery Sizov ( @vsizov
) (UTC+1, 1 hour behind@tmaczukin
)James Fargher ( @proglottis
) (UTC+1, 1 hour behind@tmaczukin
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Dangeradded 2171 commits
-
5b50f78e...646f1947 - 2170 commits from branch
master
- 80ca122b - Expose queued_for value in job payload sent to the runner
-
5b50f78e...646f1947 - 2170 commits from branch
added backend label
Allure report
allure-report-publisher
generated test report!review-qa-blocking:
test report for 5f97f00cexpand test summary
+-----------------------------------------------------------------------------------------+ | suites summary | +------------------------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +------------------------------------+--------+--------+---------+-------+-------+--------+ | Plan | 47 | 0 | 1 | 43 | 48 | ❗ | | Version sanity check | 0 | 0 | 1 | 0 | 1 | ➖ | | Create | 28 | 0 | 1 | 18 | 29 | ❗ | | Manage | 34 | 0 | 2 | 30 | 36 | ❗ | | Verify | 14 | 0 | 1 | 12 | 15 | ❗ | | Feature flag handler sanity checks | 9 | 0 | 0 | 0 | 9 | ✅ | | Configure | 0 | 0 | 1 | 0 | 1 | ➖ | | Secure | 2 | 0 | 0 | 2 | 2 | ❗ | | Protect | 2 | 0 | 0 | 2 | 2 | ❗ | +------------------------------------+--------+--------+---------+-------+-------+--------+ | Total | 136 | 0 | 7 | 107 | 143 | ❗ | +------------------------------------+--------+--------+---------+-------+-------+--------+
added 1 commit
- 8642a2b8 - Expose queue related metrics job payload sent to the runner
mentioned in commit gitlab-runner@d38cb184
mentioned in merge request gitlab-runner!3499 (merged)
mentioned in commit gitlab-runner@8413780c
added 1 commit
- 98b99c93 - Expose queue related metrics job payload sent to the runner
@DarrenEastman @erushton This is my proposal for how we could expose metrics that we could next use to define different SLI rules for different SaaS runner shards. This would also allow us to start tracking queuing performance of
private
runners that we use for dev.gitlab.org, ops.gitlab.net and staging.gitlab.com.Finally, this will also allow self-hosted Runner users to track their own runners queuing performance.
Note that this is first from two parts. Second part - which creates and exposes the metric - is added in the Runner at gitlab-runner!3499 (merged).
What do you think about the idea?
Edited by Tomasz Maczukinadded 1 commit
- 3d6de1a8 - Expose queue related metrics job payload sent to the runner
@grzesiek Do you think you could find a moment to review this change?
requested review from @grzesiek
- Resolved by Tomasz Maczukin
removed review request for @grzesiek
Nice work @tmaczukin! I left a question about where to implement this logic in.
requested review from @grzesiek
- Resolved by Tomasz Maczukin
- Resolved by Tomasz Maczukin
- Resolved by Tomasz Maczukin
Thanks @tmaczukin! We are a bit closer to a good solution here! I left one question / suggestion about making one metric a bit less confusing here.
removed review request for @grzesiek
mentioned in epic gitlab-com/gl-infra&768 (closed)
mentioned in epic gitlab-com/gl-infra&804 (closed)
added 58117 commits
-
5f97f00c...3ae8a127 - 58116 commits from branch
master
- 69cc84a6 - Expose queue related metrics job payload sent to the runner
-
5f97f00c...3ae8a127 - 58116 commits from branch
added sectionci label and removed sectionops label
- A deleted user
added databasereview pending label
- Resolved by Tomasz Maczukin
3 Warnings featureaddition and featureenhancement merge requests normally have a documentation change. Consider adding a documentation update or confirming the documentation plan with the Technical Writer counterpart.
For more information, see:
- The Handbook page on merge request types.
- The definition of done documentation.
There were no new or modified feature flag YAML files detected in this MR. If the changes here are already controlled under an existing feature flag, please add
the feature flagexists. Otherwise, if you think the changes here don't need
to be under a feature flag, please add the label feature flagskipped, and
add a short comment about why we skipped the feature flag.For guidance on when to use a feature flag, please see the documentation.
This merge request does not refer to an existing milestone. 2 Messages CHANGELOG missing: If you want to create a changelog entry for GitLab FOSS, add the
Changelog
trailer to the commit message you want to add to the changelog.If you want to create a changelog entry for GitLab EE, also add the
EE: true
trailer to your commit message.If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.
This merge request adds or changes files that require a review from the Database team. This merge request requires a database review. To make sure these changes are reviewed, take the following steps:
- Ensure the merge request has database and databasereview pending labels. If the merge request modifies database files, Danger will do this for you.
- Prepare your MR for database review according to the docs.
- Assign and mention the database reviewer suggested by Reviewer Roulette.
If you no longer require a database review, you can remove this suggestion by removing the database label and re-running the
danger-review
job.Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Aaron Huntsman (
@ahuntsman
) (UTC-5, 7 hours behind@tmaczukin
)Nicolas Dular (
@nicolasdular
) (UTC+2, same timezone as@tmaczukin
)~"Verify" Reviewer review is optional for ~"Verify" Fabio Pitino (
@fabiopitino
) (UTC+1, 1 hour behind@tmaczukin
)~"Verify backend" Reviewer review is optional for ~"Verify backend" Marius Bobin (
@mbobin
) (UTC+3, 1 hour ahead of@tmaczukin
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Danger@grzesiek Finally, I had some time to get back to this MR. I've applied some of the changes we've discussed and resolved the conflicts.
What do you think about this MR now?
requested review from @grzesiek
@tmaczukin Some end-to-end (E2E) tests have been selected based on the stage label on this MR.Please start the
trigger-omnibus-and-follow-up-e2e
job in theqa
stage and ensure the tests infollow-up-e2e:package-and-test-ee
pipeline are passing before this MR is merged. (The E2E test pipeline is computationally intensive and we cannot afford running it automatically for all pushes/rebases. Therefore, this job must be triggered manually after significant changes at least once.)If you would like to run all E2E tests, please apply the pipeline:run-all-e2e label and trigger a new pipeline. This will run all tests in
e2e:package-and-test
pipeline.The E2E test jobs are allowed to fail due to flakiness. For the list of known failures please refer to the latest pipeline triage issue.
Once done, please apply the
emoji on this comment.For any questions or help in reviewing the E2E test results, please reach out on the internal #quality Slack channel.
- Resolved by Tomasz Maczukin
- Resolved by Tomasz Maczukin
I left two questions @tmaczukin!
removed review request for @grzesiek