Define an initial policy for Runway service size
Runway is doing an amazing job of encouraging GitLab services to be built on a consistent platform, allowing application engineers to build on top of an ever-improving platform layer offering observability, provisioning, security benefits and many other advantages to application development teams.
One question which has started arising is "how big should services be?". As a company, our Monolithic architecture has historically worked well for us and while Runway allows for the potential proliferation of services, it's important that we encourage services of the right size.
Having some broad policies around when to build into an existing codebase vs when to create a new service should be something that the Platform team should define a policy for, even if that policy starts small and is updated incrementally.
This came up recently in the context of gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#527 (closed). It's likely that as adoption ramps up there will be many more discussions of this sort, so having some guidelines may help direct these decisions.
Some considerations for deciding on the right size for services:
- Is the runway deployment going to be shipped to self-managed or Dedicated customers?
- Broadly, the policy should only apply to external services.
- Internal-only services are much easier to manage, and having a large number of small internal services is probably reasonable.
- Having lots of small services which the support teams are expected to support and Self-Managed Operations teams are expected to operate may add complexity.
- Conways-Law considerations: are the services build by two teams or a single team? If two teams, having multiple services may be more reasonable.
- Shared codebase: if two services share the majority of their codebase, this may be an indicator that they belong together
- Dependencies: if two services are interdependent (ie, circular dependencies) this is a strong indicator that the services may be better together.
- Coupling: how tightly coupled are the services? Tightly coupled services may be better as a single service.
- Requirement to support both GRPC and HTTP(s): this should not be a good reason to separate services as Runway supports multiple deployments
What other factors are there?
I think the goal of a policy document is to build an organization-wide understanding of what the right-size for services is. Over the medium or long-term, we don't want to accidentally create a burden, particularly on Self-Managed administrators, Support Engineers, or Dedicated Environment Automation, if we end up producing a very large number of different services.
The doc could be hosted either in the Handbook or the Runway docs (either option should cross-reference the other)