Architecture for Custom Models

Summary

Our current architecture was built to support AI features that are all being routed through the same base models for GitLab.com, Dedicated, and Self-managed GitLab customers. This happens via a single instance of the AI Gateway. All requests for AI features are first routed through the GitLab monolith in order to confirm the user authorization and whether they have access to the AI feature via a GitLab group or GitLab instance with an AI feature license.

With Custom Models, it makes sense to re-evaluate this architecture and form a plan that supports the single model as well as additional models that customers may choose to run themselves.

The definition of done for this issue is a new iteration to the AI architecture blueprint.

Problems areas to consider

Multiple AI Gateways or just one: does each self-managed/dedicated customer have their own AI Gateway or do all customers use the same central AI Gateway service to access their Custom Models? For air-gapped GitLab instances, having a separate AI Gateway is probably required so that the AI Gateway can be offline / live in the same network as the GitLab instance.
Authentication and authorization lives in the Monolith today: if we were to connect directly to the AI Gateway, what would authn/authz look like? The first iteration of AI Gateway auth for Code Suggestions used a JWT with an expiration of 1 hour. This works but also comes with opportunities for abuse because JTWs are non-revokable.
AI feature logic largely lives in the Monolith today: Code completion LLM requests require little instruction, which means that the only piece of the "prompt" that needs to be sent is the code itself and any nearby code. For every other feature, we construct a prompt using context from the Monolith and other hard-code prompt information. For example, here is an explanation of why code generation requests must go through the monolith.
Bypassing the monolith means moving logic to the AI Gateway: If we change the architecture to bypass the gateway, there will still be some components of the AI feature flow that will need to pull data from the monolith, which contains user and membership information. If we start pulling more logic in to the AI Gateway and we have multiple AI Gateways to support custom models, that will result in customers who are running different versions of GitLab, different version of the AI Gateway, and different versions of the IDE client. Dealing with version compatibility could become a nightmare. This point came up here.
Self-managed/dedicated and GitLab.com parity: Any solutions we come up with should work for customers on all GitLab deployments. We should avoid having different codepaths/architecture for gitlab.com and non-gitlab.com deployments.

Affected groups

These groups should be consulted on this decision as it affects them and their roadmap.

groupai framework : AI Framework owns the AI Gateway and the AI Architecture Blueprint.
groupcloud connector : "we were planning to evolve the current architecture, which was to move feature backends such as the AI gateway behind a central Cloud Connector gateway: https://docs.gitlab.com/ee/architecture/blueprints/cloud_connector/" (comment)
groupcode creation : issue discussing possibility of changing architecture to improve latency for code suggestions feature.
Incubation Engineering Department : this team is working on #f_custom_agents and

Resources & Related links

GitLab Duo Glossary
AI Architecture Blueprint (current architectural vision)
Demo of POC for Custom Models, which discusses 2 options for architecture direction.
[Investigation] Client <> AI Gateway Architectu... (#434063)
[Investigation] Client <> LLM Architecture chan... (#433551 - closed)
AI Agents MVC (gitlab-org/incubation-engineering/ai-assist&4)
Improved Code Suggestions Architecture to Decre... (&12224)
Meeting recording of @timzallmann discussing why bring-your-own-model doesn't work but custom models does (start at 6 minutes 15 seconds)

Edited Feb 22, 2024 by Jessie Young