Draft: Create a new instance_critical runner type to separate builds on shared runners into a separate queue (!60946) · Merge requests · GitLab.org / GitLab

drew stachon requested to merge tiered-runners into master May 04, 2021

What does this MR do?

This is an explanation of my idea for splitting builds on shared runners into two queues. We simply add a fourth runner type (and get the free observability that comes with it!) and add a new query to be used by Runner designated as the new type.

Setting up the new builds queue is slightly more complicated. We add an indexed column to project_features allowing the new query:

.joins('INNER JOIN project_features ON ci_builds.project_id = project_features.project_id AND project_features.critical_shared_runners_access_level > 0')

to execute quickly.

Upsides

This is a relatively light-touch approach to introducing a new queue for paid builds. I modified only as much of the existing builds query as I needed to to make this work. This is not an optimization effort for #builds_for_shared_runner.
It's something we can ship quickly with no deployment-time effects. No runners will have this designation, and so queues should be entirely unimpacted. The new query will be "dead code" on arrival, and we can test slowly and internally.

Downsides

This solution is not extensible. If we (or another large mutli-tenant host of GitLab) want to have separate runner groups for myriad plan tiers, this won't really help.
This solution is, by design, not scalable. The exact worst thing to do would be to say "Well ALL my builds are important, ALL the runners are critical!" The only thing we hope to achieve here is to make certain builds picked up at a low latency / higher SLO by virtue of them being in a smaller queue. We are making no other improvements. If the queue gets larger, the advantage will diminish.
It requires a new database index, but I think it's one we're especially willing to pay for.

Is this ready?

No.

I have not written any tests
I have not written any mechanism for enabled access to critical shared runners.
I have not thought about how to protect write access to critical_shared_runners_access_level.

What should we do right now?

Please leave feedback regarding the viability of this approach from a jobs-registration perspective. That operation is the critical need for this effort, so I won't continue with anything else unless we can validate that this approach will be sufficiently performant for our needs on gitlab.com

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Certainly not.

Conformity

📋 Does this MR need a changelog?
- I have included a changelog entry.
- I have not included a changelog entry because _____.
Documentation (if required)
Code review guidelines
Merge request performance guidelines
Style guides
Database guides
Separation of EE specific content

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited May 04, 2021 by drew stachon

Draft: Create a new instance_critical runner type to separate builds on shared runners into a separate queue