Proposal: Dedicated Cloud Connector service

Note

❗ This issue has been transformed into a design document in !132977 (merged). Please comment in the MR instead.

Problem

A common misconception in discussions so far is that there is a "Cloud Connector box" (in the abstract sense), which I believe most people envision as a web service of some kind, and which acts as the single ingress for self-managed traffic into GitLab, fronting all other services we provide.

As per the latest revision of the AI architecture, we do not have such a service (or abstraction in any sense of the word) currently. The primary reason we don't is because it was not necessary to have for the Cloud Connector MVC (Code Suggestions), and it would have added significant complexity without providing clear benefits.

However, with the first CC feature released, a second one in flight (Duo Chat), and several others considered, we are at a point now where it makes sense to explore what benefits and drawbacks such an abstraction may have.

We can brainstorm in this issue. If there is consensus that this is worth exploring more deeply, the deliverable for this issue can be an Architecture Blueprint describing the suggested change.

Where abstractions may be needed

A welcome property of the current AI architecture is that SM/Dedicated and SaaS work almost identically, which allows us to iterate quickly.. The only major difference is in how these instances obtain access tokens to call into GitLab-hosted services (currently: the AI gateway). This works well in the context of existing and future stateless AI features, but will get more challenging as we add more diverse features over time.

I see several areas where we may need more centralized abstractions once we move past GA for the first 2-3 AI features:

Non-AI features. The most obvious ones are use cases that do not relate to AI at all. Much of the existing functionality has been built out in the AI gateway and AI abstraction layer, which act as the entry points for existing Cloud Connector features. Clearly, this means that if we would provide access to, say, an CI/CD runners feature hosted by a then-existing ci-runners-gateway service, we would not only double the entry points into CC features, we'd also either have to duplicate shared concerns across these services or extract them somewhere (where?)
Authentication/Authorization. Authentication (who are you?) and authorization (can you do this?) are currently implemented across 3 different systems for Cloud Connector:
1. gitlab-rails: receives a request, and performs ordinary request auth/z as it would with any API call
2. AI gateway: verifies authenticity of access tokens received from gitlab-rails
3. CustomersDot: issues access tokens to gitlab-rails (SM only)
While 1 and 3 make perfect sense, as per the first bullet point, for any non-AI feature we would either have to extract these checks from the AI gateway and make them available as a library or separate service, or we need to duplicate them in all services that sit downstream from Rails and participate in CC features.
Telemetry. Telemetry has proliferated and is spread across a number of systems and clients, and is also not implemented consistently. Moving into more diverse product spaces makes this even more challenging. There may also be CC specific telemetry that is not relevant to SaaS, such as unique instance count (for SaaS, instance count is always 1). This kind of telemetry might be better tracked in a service or module specific to CC.
Rate limiting/request budgets. While we have existing mechanisms to throttle traffic into GitLab infrastructure at large, we may want to have more fine-grained control (perhaps even accessible via some form of UI) over which CC customers can use what and to what extent. Such controls would best be isolated to a CC specific application layer or web service.
Single entry point for customers. We want to keep configuration customers have to apply for CC to work, such as proxy and firewall config, to a minimum. For now this is just the AI gateway, but if we were to run N such services, a customer should not have to be concerned with that. While we can mitigate this to some extent with DNS and wildcard permissions, we know this is not something all customers are comfortable with.
Reducing risk surface. If we follow the current pattern, which is to expose the AI gateway to the public internet, we would increase our risk surface with every additional feature service we require for Cloud Connector to operate. If instead they were fronted by a single public CC web service, they could be privately routed instead, and made inaccessible to any clients other than GitLab Inc services we trust.

Exploration: How a Cloud Connector service could fit in

Properties

Single entry point for CC features. All GitLab instances (including SaaS) talk to this service, reachable e.g. at cloud.gitlab.com.
Private vs public facing services. All services sitting downstream from the CC service (i.e. the AI gateway and other prospective feature gateways) do not face the public internet; they are deployed into GitLab's cloud via Runway and only reachable from other services in GitLab's cloud.
Authentication moves to CC service. Currently, the AI gateway verifies the authenticity and scope of each incoming request. It must do so, because it cannot trust public clients. Were the AI gateway to be privately routed, and were the CC service to carry out these checks instead, we could simplify the AI gateway implementation (and other such services) to be merely API facades around some kind of cloud-connected functionality (such as 3P AI models.)
Request budgets. If the CC service would operate independently, it could be stateful and allow for defining per-customer request budgets, if this is something we choose to do. It could do that by talking to connected storage or services to establish what quota a customer has purchased, and how far exhausted it is. We could even envision a UI sitting on top of this that provided convenient controls that could be steered by a product manager or engineers, not just SREs.

Benefits & drawbacks

Benefits

Single point of access for GitLab-hosted CC features. By fronting internal services with a single public service, we reduce risk surface and make instance configuration easier. We could even consider moving access to CustomersDot behind this service, which would allow us to track and enrich these calls with additional data without making changes to CustomersDot itself, which could provide is with a better separation of concerns.
Co-locating CC specific logic. While that will not work for everything, some code modules that purely exist to support CC use cases could move here, thus tightening domain boundaries. One example is auth/z.
Independently scalable. We would be able to scale this service independently of the AI gateway. Especially in conjunction with defining request budgets, this could be useful or necessary.
Can be stateful. It would allow us to query attached resources like storage. One case where this would be necessary is when enforcing some form of per-customer or per-org request budget at the application level (instead of at a front door service like Cloudflare).

Drawbacks

Introducing a new service. I think this in itself is a drawback at first since it comes with numerous complexities that we otherwise wouldn't have:
- Separate repo + CI/CD required
- Separate deployment pipeline to manage with Runway.
- Separate dashboards and telemetry to add and monitor.
- Providing on-call and writing runbooks.
Additional latency. Because the CC service would front actual per-feature services, additional latency will be introduced to even latency-sensitive features like Code Suggestions.

Open questions

Which tech stack would we choose? Ruby, Go, Python etc
API vs router. I see two competing ways of how this service could operate (perhaps even a hybrid of it):
1. As a router/proxy. In this approach, which is taken by Workhorse as well, no dedicated endpoints are defined. Incoming requests are instead intercepted and a routing module decides how to deal with them, e.g. "code suggestion request goes to codesuggestions.gitlab.com". Pros: no need to mimic or duplicate every single internal feature API endpoint. Cons: would still tightly couple clients (instances) to downstream APIs, which is a leaky abstraction.
2. As a facade/adapter. In this approach, this will be an ordinary web service with likely some sort of JSON based HTTP API. Pros: provides better isolation of the underlying backends. Cons: requires re-building every single endpoint for clients.

Edited Oct 02, 2023 by Matthias Käppler