Skip to content

Support instance-agnostic bots

Context

Currently, bots connect to a single specific instance name in BuildGrid, and then accept jobs that are submitted with the same instance name. This leads to wasted compute whereby some bots which are technically capable of executing jobs for another instance name sit idle, whilst that other instance name may have a large queue.

Additionally, this reuse of instance names in the bots means that adding a new instance name to BuildGrid for a client workflow means new dedicated bots need to be created and deployed, complicating and discouraging the use of instance names in this way.

We currently have an "instance pools" feature in our configuration, where the name field of an entry in the instances section of our configuration can take an array of names. These names are then all treated as aliases for each other. This provides some help in sharing compute, but is not sufficiently flexible to handle cases where instance names need different BuildGrid configuration but the same bots.

Description

This MR reworks the Bots service in BuildGrid, adding support for a wildcard parent in CreateBotSession requests. Setting parent to * rather than an instance name will create a bot which ignores instance names during job assignment. These bots can still be limited to specific types of work using platform properties, but provide a more effective way to share resources across many instances than the current instance pools feature.

As part of this change, the actual implementation of the Bots service is moved into the servicer class itself rather than the per-instance BotsInterface, since the instanced model no longer makes much sense with the implementation. The BotsInterface now simply wraps a Scheduler object, to support this feature without requiring configuration changes.

Longer-term we should make the Bots service an UninstancedServicer, but this will need a configuration deprecation period and likely makes sense as part of a larger configuration restructuring to help with other difficulties when using many instance names.

Merge request reports

Loading