Skip to content

Draft: Create a new instance_critical runner type to separate builds on shared runners into a separate queue

drew stachon requested to merge tiered-runners into master

What does this MR do?

This is an explanation of my idea for splitting builds on shared runners into two queues. We simply add a fourth runner type (and get the free observability that comes with it!) and add a new query to be used by Runner designated as the new type.

Setting up the new builds queue is slightly more complicated. We add an indexed column to project_features allowing the new query:

.joins('INNER JOIN project_features ON ci_builds.project_id = project_features.project_id AND project_features.critical_shared_runners_access_level > 0')

to execute quickly.

Upsides

  • This is a relatively light-touch approach to introducing a new queue for paid builds. I modified only as much of the existing builds query as I needed to to make this work. This is not an optimization effort for #builds_for_shared_runner.
  • It's something we can ship quickly with no deployment-time effects. No runners will have this designation, and so queues should be entirely unimpacted. The new query will be "dead code" on arrival, and we can test slowly and internally.

Downsides

  • This solution is not extensible. If we (or another large mutli-tenant host of GitLab) want to have separate runner groups for myriad plan tiers, this won't really help.
  • This solution is, by design, not scalable. The exact worst thing to do would be to say "Well ALL my builds are important, ALL the runners are critical!" The only thing we hope to achieve here is to make certain builds picked up at a low latency / higher SLO by virtue of them being in a smaller queue. We are making no other improvements. If the queue gets larger, the advantage will diminish.
  • It requires a new database index, but I think it's one we're especially willing to pay for.

Is this ready?

No.

  • I have not written any tests
  • I have not written any mechanism for enabled access to critical shared runners.
  • I have not thought about how to protect write access to critical_shared_runners_access_level.

What should we do right now?

Please leave feedback regarding the viability of this approach from a jobs-registration perspective. That operation is the critical need for this effort, so I won't continue with anything else unless we can validate that this approach will be sufficiently performant for our needs on gitlab.com

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Certainly not.

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by drew stachon

Merge request reports