Unified AI Gateway Architecture
This proposal is in response to #159 (closed) and https://gitlab.com/gitlab-org/gitlab/-/issues/414408, and follows discussions between various stakeholders on a potential route forward.
Unified AI Gateway Architecture
Executive Summary
- The changes announced in #159 (closed) did not address the concerns raised about the approach, eg note from @stanhu, note from @andrewn.
- This approach incorporates the best of both approaches, avoiding bundling GitLab.com-only infrastructure into the GitLab application, ensuring that SAAS-only services do not run inside our existing monolith and avoiding availability issues for GitLab.com.
- This approach ensures parity in our AI functionality between GitLab.com and self-managed without having to replicate code in different services.
Diagram
Headline Changes
-
All requests traverse the GitLab instance and the AI Abstraction Layer
- IDE extensions are routed to the GitLab API (
/api/v4/...
), not to the current separate endpoint (codesuggestions.gitlab.com
). - IDE extensions will route self-managed code-suggestions to the user's local GitLab instance, not a shared instance.
- Inside the monolith, all AI requests will be routed through the AI Abstraction Layer to the AI Gateway.
- IDE extensions are routed to the GitLab API (
-
All GitLab instances will route AI requests via AI Gateway infrastructure
- This will add third-party AI provider authentication headers before forwarding on to providers and APIs.
- The AI Gateway will provide monitoring, metrics, performance measurement across all GitLab AI requests (including GitLab.com, Self-Managed, Dedicated, FedRamp etc).
- Multiple instances of the AI Gateway can be deployed in different regulatory domains (ie, America, Europe, Japan, FedRamp) to allow self-managed and Dedicated customers to use AI services while complying with regulatory restrictions.
- Self-Managed instances will not communicate with GitLab.com, as this would lead to scalability, availability issues, code complexity issues, and be inaccessible for compliance reasons for many Self-Managed and Dedicated customers, and all FedRamp customers (in future).
Walkthroughs of an AI request
This section describes some scenarios for an AI request.
GitLab.com Ultimate User using Code Suggestions
- The user is using an IDE extension and types some code expecting a suggestion. (See
1️⃣ on diagram) - The IDE extension makes a request to GitLab.com's API, passing the user's PAT token along with the request. (
2️⃣ ) - GitLab.com authenticates the user, validates their token and checks whether they have access to the AI feature.
- GitLab.com passes the request to the AI Abstraction layer, which generates a request, along with a JWT token, to the AI Gateway,
ai-gateway-us-east-1.gitlab.com
. (3️⃣ ) - Using the JWT token, the AI Gateway cryptographically verifies that the request is from GitLab.com.
- The AI Gateway injects authentication headers and forwards the request on to the appropriate provider. (
4️⃣ ) - Metrics for the request, including accept rates are recorded centrally in the AI Gateway service. (
5️⃣ )
gitlab.example.eu
Ultimate User using Code Suggestions
Self-Managed GitLab - The user is using an IDE extension and types some code expecting a suggestion.
- The IDE extension makes a request to the
gitlab.example.eu
API, passing the user's PAT token along with the request. -
gitlab.example.eu
authenticates the user, validates their token and checks whether they have access to the AI feature. -
gitlab.example.eu
passes the request to the AI Abstraction layer, which generates a request, along with a JWT token, to the instances configured AI Gateway,ai-gateway-eu.gitlab.com
. - The JWT token is verified (details TBD)
- The AI Gateway injects authentication headers and forwards the request on to the appropriate provider.
- Metrics for the request, including accept rates are recorded centrally in the AI Gateway service.
gitlab.example.eu
Ultimate User using Chat
Self-Managed GitLab - User posts a chat message to the chatbot running on
gitlab.example.eu
. -
gitlab.example.eu
passes the request to the AI Abstraction layer, which generates an AI request, along with a JWT token, to the instances configured AI Gateway,ai-gateway-eu.gitlab.com
. - The JWT token is verified (details TBD)
- The AI Gateway injects authentication headers and forwards the request on to the appropriate provider.
- Metrics for the request, including accept rates are recorded centrally in the AI Gateway service.
Considered alternatives
What are the alternatives?
There are two alternative approaches. We will explain the disadvantages of these two approaches.
Alternative 1: Proxy Self-Managed + Dedicated Requests through the AI Gateway, use providers directly for GitLab.com
This approach uses different routes for GitLab.com and Self-Managed.
- This will add additional complexity to the application. The application will need to be built with all model clients, although these will not be used in Self-Managed instances, which will instead forward requests to the AI Gateway.
- Functionality will need to be synchronised between the AI Abstraction Layer and the AI Gateway, to ensure parity.
- No central logging or metrics.
Reasons to select the Unified AI Gateway over Alternative 1
- Additional client logic in multiple places when we can dogfood the AI Gateway on gitlab.com and self-managed and demonstrate to our customers that we're using the same tooling for GitLab.com and SAAS.
- This approach would be a very poor prototype for other GitLab Plus Services in future, which would also require the duplicated functionality leading to complexity.
Alternative 2: Forward GitLab Self-Managed + Dedicated Requests to GitLab.com, no AI Gateway.
This approach would carry the greatest set of drawbacks, and from an Infrastructure perspective is the worst of all combinations.
- It creates a great deal of addition traffic on GitLab.com, which will lead to scaling and availability issues.
- It crams non-application domain data into the GitLab application.
- It cannot be used in regulatory or high-compliance environments, including for many banking, dedicated and government customers.
- @stanhu added counterarguments to this approach in #159 (comment 1432697053).
- @andrewn added counterarguments to this approach in https://gitlab.com/gitlab-org/gitlab/-/issues/414408#note_1429061838.
Reasons to select the Unified AI Gateway over Alternative 2
- Lower risk of saturation and utilization spikes on GitLab.com.
- A much strong privacy stance.
- GitLab-to-GitLab communications will be complicated, particularly since we have no control over the "client" GitLab versions, which will not match the GitLab.com and may be out of date by multiple releases.
List of changes needed to adopt this approach
- Add a new
code_suggestions
endpoint to the GitLab API. - This endpoint should send requests via the AI Abstraction layer.
- Update the VS Code extension to forward requests to the source GitLab instance rather than a static endpoint.
- AI Abstraction layer should forward requests to the AI Gateway, instead of directly to the AI providers (as is happening at present on GitLab.com).
- AI Gateway can be bootstrapped as the existing Model Gateway. It will need to add per-provider endpoints for Anthropic, OpenAI etc.
- When these endpoints are requested, the AI Gateway injects an authentication header, records telemetry details (as is being done already) and then forwards the request to a provider.
- AI Gateway needs to be able to verify self-managed requests against GitLab.com (in future this should migrated to another source).
Once this is done, self-managed, dedicated and GitLab.com will all be able to use AI features and provide unified telemetry.
Questions and Answers
-
What about the risk of scaling the AI Gateway to handle hundreds of thousands of concurrent users?
- Scaling a small and stateless (async/
libuv
) service is much easier than scaling the synchronous Rails monolith running GitLab.com. - Pushing this traffic through GitLab.com would lead to onward pressure on our main PostgreSQL database, Redis, etc
- The observability work for managing this scaling has mostly already been done.
- An HPA can be added for dynamic scaling with ease.
- Scaling a small and stateless (async/
-
How do we go about authenticating GitLab Self-Managed and GitLab Dedicated Customers
- For the first iteration, work is already underway to utilize GitLab.com for this, but this is a temporary solution.
- This can be used initially, but over the longer term it would be better to migrate this authentication traffic to CustomersDot or another more appropriate source.
-
How long will this take to deliver?
- None of the changes described in the previous section should take long. This would be weeks maximum.