Runners platform control-plane prototype
Background
Many discussions linked from https://ops.gitlab.net/gitlab-com/gl-infra/ci-runners/hosted-runners-mgmt/-/merge_requests/6 raise the potential of having a schema-driven control plane to both manage the runner deployments along with infrastructure.
There are many existing platforms that follow this approach, with the following reasoning:
- Moving beyond traditional terraform based approaches, that rely on CI jobs, repeated terraform plan/apply loops and external state management.
- API first design and event driven automation. Allows for both GitOps approach for primary configuration, but also provides a basis for a self-service platform.
- Self-healing systems, health checks, automated remediation, fewer manual interventions required.
- Monitoring of control planes and created resources provides a basis for reliable operations and SLOs.
- Ability to perform predictive scaling based on historical data and incoming work.
Suggested PoC
A timeboxed proof of concept should be created to demonstrate the possibilities.
Suggested technologies to use for this PoC:
- Kubernetes: k8s native tools provide a mature platform for building control planes and operators.
- Crossplane has emerged as a platform building tool.
- GitLab Runner Operator as an existing solution to provision runner deployments.
Suggested functionality:
- Basic runner shard resource schema to drive runner operator.
- Simple example of infra management, e.g. creating GCS cache bucket for shard.
- Ability to auto-register the runner in a demo project that will demonstrate running pipelines.
Edited by Igor