Spike: Simplify Model Updates and Migrations
Background
The Ai tools and features in Gitlab each have the specific Ai model that they consume specifically referenced at the level where the prompts are being built, unless they are falling back to the default model set for the provider. There are currently only 8 models that the tools can chose from split over 2 providers, which are initialized in the available_models.rb file. This creates a situation where many of the tools are using the same models making it slightly redundant to set the specific model at such a low level. Where this redundancy becomes a problem is when updating or transitioning models, requiring each tool to be independently updated at the prompt level.
Example Model Reference Definitions
CLAUDE_3_5_SONNET = 'claude-3-5-sonnet-20240620'
CLAUDE_3_SONNET = 'claude-3-sonnet-20240229'
CLAUDE_3_HAIKU = 'claude-3-haiku-20240307'
CLAUDE_2_1 = 'claude-2.1'
VERTEX_MODEL_CHAT = 'chat-bison'
VERTEX_MODEL_CODE = 'code-bison'
VERTEX_MODEL_CODECHAT = 'codechat-bison'
VERTEX_MODEL_TEXT = 'text-bison'
Example Implementation
{ prompt: conversation,
options: { model: ::Gitlab::Llm::Anthropic::Client::CLAUDE_3_HAIKU }
}
Improvements
Transition from model names to function classifications
The first set of changes to simplify the model migration process would be to abstract away the specific model name references in the tools, replacing them with model function classifications (i.e. "Latest_Instant_Model"). These classifications can be more general and remain valid even if the model that they reference is modified. This change allows a simple straightforward update in one location, to replace manually updating the models in each specific prompt file for the dozens of tools. The flexibility would also remain to set specific models, which could be potentially needed for tools that require the previous generation of a model.
Migrate resolving the specific model to use, based on function classification, to AIGW.
Currently, the model selection happens within the GitLab monolith. The proposition is migrating that resolution logic to the AI Gateway.
This offers several advantages:
-
Centralized Configuration: Model mappings (from classifications like "Latest_Instant_Model" to actual model names like "claude-instant-1.2") would be managed in the AI Gateway's configuration, making updates simpler and atomic.
-
Reduced Monolith Complexity: The monolith would only need to specify the desired functional classification, simplifying its code and reducing the risk of errors during model updates.
-
Gateway Flexibility: The Gateway could implement more complex model selection logic, such as A/B testing or dynamic selection based on request characteristics, without requiring monolith changes.
Potential to leverage existing code: AIGW currently has many already implemented features that will aid in this transition. Some are related to existing functionality and others are a product of forward thinking.
-
Model Configuration:
-
ai_gateway/config.py: The Config class defines various model configurations. Crucially, the default_prompts and model_engine_concurrency_limits sections are key for model selection. They would allow configuring which model ('base', 'mistral', etc.) is used by default for a given prompt ID, and specify concurrency limits per model and engine. This data needs to be moved from the monolith to gateway config.
-
ai_gateway/prompts/config/base.py, ai_gateway/prompts/config/models.py: These define data structures to hold model parameters (temperature, top_p, etc.) and link them to specific prompt configurations. This allows flexible per-prompt model tuning, again configurable within the Gateway.
-
ai_gateway/prompts/registry.py: The LocalPromptRegistry reads prompt definitions (from YAML files) and instantiates Prompt objects. The YAML files now specify which model configuration to use for each prompt. The mapping between prompts and models would reside here, migrated from the monolith.
-
-
Model Instantiation:
- ai_gateway/models/container.py: The ContainerModels class provides dependency injection for various models (Anthropic, Vertex, LiteLLM). This is where the actual model objects are created based on the configuration.
- ai_gateway/prompts/base.py: The Prompt class uses a model_factory and the prompt configuration to create the specific model instance when a prompt is requested. The model_metadata parameter allows overriding the default model configuration at request time, giving additional flexibility (although it's conditional on custom_models_enabled).
-
Request Handling:
- ai_gateway/api/v1/prompts/invoke.py and ai_gateway/api/v2/code/completions.py: These endpoint handlers receive the request, including potentially model selection information (in prompt_request.model_metadata or payload.model_provider and payload.model_name). They use the PromptRegistry to fetch the appropriate Prompt object, which in turn creates the correct model instance based on the combined default configuration and any request overrides. If the monolith wants to select a model other than the default, it would pass appropriate parameters in the request to these endpoints.
Summary - The current AIGW code enables model selection by providing a configurable mapping between prompt IDs and models (in the YAML prompt definitions). Allowing for runtime model overrides in the request. Using dependency injection to create the selected models. The actual model selection logic would need to be implemented in the GitLab monolith (for now) and communicated to the Gateway via the request parameters. This code prepares the Gateway to receive and act on that selection information. This would also ensure backwards compatibility.