[Investigation] Client <> AI Gateway Architecture changes for Code Suggestions (Post GA)
Description
The emphasis placed on Code Suggestions Performance improvements (&12160) underlines a change in requirement prioritization. Performance is now a high quality attribute requirement. In the past we concluded we were willing to make a performance trade-off, but that may not be the case anymore.
Let's revisit the discussion to proxy through the monolith vs. connect directly to the AiGateway.
Advantages of skipping the Monolith
In addition to the improved latency, here are the advantages of not relying on the Monolith for every code completion call:
- Assuming more product changes go directly into the AI Gateway or the editor extensions, these components can be deployed more quickly and do not require SM customers to update versions.
- We can do experimentation on proposed changes more easily in a single location (AI Gateway)
What should our latency be?
Our aspirational goal is for P90 for code completion to be 500ms. While this is unlikely to be achieved in the first few iterations, minimizing latency is crucial to the success of the code suggestions.
Proposal/Questions
In the spirit of this issue, there's a scalability and performance win when Client's can connect directly to the service providing AI features.
What if SaaS and self-managed GitLab monoliths acted as a Service-Registry for AiGateway (and possibly other services in the future)? Maybe this is the intent of Cloud Connector? This communication could look like:
sequenceDiagram
autonumber
participant C as Client
participant G as GitLab
participant AI as AI Gateway
C->>G: POST GitLab<br>with "/ai_gateway_auth"
G->>C: return `jwt`, `ai_url`, `expires`
loop Each code suggestions request
alt token is about to expire?
C->>G: POST GitLab<br>with "/ai_gateway_auth"
G->>C: return `jwt`, `ai_url`, `expires`
end
C->>AI: POST `ai_url`/completions<br>with `jwt`
AI->>C: Response
end
This jwt
token can be scoped to the requesting GitLab user. Right now these tokens are instance-level tokens. Could GitLab instances act as OAuth Applicaitons which are registered/deregistered with the AiGateway based on the instance-level subscription?
References
- Architecture blueprint: AI Gateway
- Architecture blueprint: Cloud Connector gateway service
- Documentation: AI architecture
- Documentation: Cloud Connector MVC: Code Suggestions for Self-Managed/GitLab Dedicated