Proposal: Replace service inventory with service catalog
Context
Right now, the first step to onboarding is opening MR to add a service to provisioner project, e.g:
{
"inventory": [
{
"name": "my-service",
"project": "https://gitlab.com/my-org/my-project"
}
]
}
After approval and merging, a pipeline is run to provision the service. While some optional fields are specific to Runway, it is in essence a service catalog.
Proposal
Replace Runway service inventory with GitLab SaaS Platforms service catalog. The first step to onboarding is opening MR to add a service to runbooks project instead, e.g:
services:
- name: my-service
owner: my-stage-group
runway:
project: https://gitlab.com/my-org/my-project
After approval and merging, a cross-project pipeline would be run to provision the service instead. Absolutely nothing else changes about Runway at all.
Benefits
- Services would automatically have service and metrics catalog entries, which is requirement for production readiness process
- Services would automatically be evaluated against service maturity model, which could be expanded with WASF
- Services would be automatically be owned by teams.yml, which includes context Runway can use for alerts, error budgets, cost attribution, and more
- Services would automatically include service overview dashboards
- Services would automatically include SLIs and SLOs
- Services would automatically include saturation monitoring and capacity planning
- Services would automatically include docs for on-call SREs
- Services would automatically have just right amount of friction to avoid making it little too easy to provision infra w/ bunch of test services
- Services are only onboarded once, we can check off all these boxes off and more by simply changing the single source of truth
Trade-offs
- CI build times in runbooks are brutal, opportunity for us to speed up builds for everyone
- Adding new service in runbooks can be trial and error, opportunity for us to add scaffold tooling for everyone (i.e non-runway services)
- Runway would be further coupled to runbooks, in addition to existing coupling w/ GitLab SaaS Metrics Platform (i.e thanos)
- Runbooks can seem little overwhelming to non-infra folks, opportunity for us to make infra best practices clearer to stage group owners
- Experiment and production services would exist in same service catalog, opportunity for us to add formally make this distinction
Long-term
It is starting to become clearer that a descriptor format that exists in source projects are the direction that many tools like Backstage or alternatives are heading. You could even imagine a custom solution w/ runway.yml, Renovate bot, etc.
This proposal would be a simple change that could be an interim step to help us get there by improving both Runway and GitLab SaaS Platforms service catalog.