Runway (Experimentation Spaces) Architecture
The goal of this issue is to lay out a high-level architecture. This will divide the Experimentation Spaces product into distinct responsibilities, so that we can work on them independently and perhaps even switch out components if needed.
Responsibilities
The responsibilities we have identified thus far are:
- Provisioning
- Deployment
- Reconciliation
- Runtime
- Observability
By designing interfaces between them, we can de-risk lock-in to a particular implementation. This needs to be balanced against least-common-denominator/worst-of-all-worlds.
https://gitlab.com/gitlab-com/gl-infra/platform/experimentation-spaces/-/issues/9
ProvisioningThe provisioning process is responsible for taking a request "create an experimentation space for me", and stamping out the minimum required infrastructure for that space. It also covers decommissioning when a space is no longer needed.
https://gitlab.com/gitlab-com/gl-infra/platform/experimentation-spaces/-/issues/10
DeploymentThe deployment process is responsible for taking an artifact (e.g. a docker image) from a customer and bringing that into a runtime. This includes rollout strategies, rollbacks, canarying, multi-environment promotion, as well as diagnostic tools for failed deploys. Some of these capabilities may also be delegated to the runtime. There should also be a standard way for connecting an existing code base to a deployment.
https://gitlab.com/gitlab-com/gl-infra/platform/experimentation-spaces/-/issues/11
ReconciliationThe Reconciler is the heart of the system. It is responsible for creating a desired view of the world (based on service definition and current version), finding the differences from the actual state, and then applying that diff. It will also require some form of storage.
https://gitlab.com/gitlab-com/gl-infra/platform/experimentation-spaces/-/issues/2
RuntimeThe runtime is responsible for actually scheduling and running the customer's workloads. Deployment targets a runtime. Runtime will provide autoscaled compute resources with a degree of tenant isolation. It will also optionally expose an endpoint at which the workload can be reached. This endpoint will have a DNS name and be TLS encrypted.
https://gitlab.com/gitlab-com/gl-infra/platform/experimentation-spaces/-/issues/12
ObservabilityThe observability stack serves two purposes. First, it allows customers to operate their applications. Second, it connects to existing monitoring, alerting, and capacity planning processes owned by Infrastructure.
Diagram
Note: Service Developers (formerly "Customers") are GitLab team members who are developing/deploying the "experiment" and code within.
Next steps
- Get team sign-off on architecture
- Design interfaces
- Begin design for each component