Introduce generic embeddings llm class
What does this MR do and why?
Introduce generic embeddings LLM class Gitlab::Llm::Embeddings::CodeEmbeddings for sending embeddings requests to AIGW's new /v1/embeddings/code_embeddings endpoint.
This adds:
-
Gitlab::Llm::Embeddings::CodeEmbeddings- main LLM class that handles batch processing with recursive splitting on token limit errors -
Gitlab::Llm::Embeddings::Client- HTTP client for communicating with AIGW -
Gitlab::Llm::Embeddings::ModelDefinition- configuration for model parameters and AIGW URL selection -
Gitlab::Llm::Embeddings::Response- response wrapper for parsing embeddings from AIGW
The new class replaces Ai::ActiveContext::Embeddings::Code::VertexText and is controlled by the use_code_embeddings_llm_class feature flag.
References
Related to #590572 (closed)
How to set up and validate locally
Prerequisites
- Enable the feature flag:
Feature.enable(:use_code_embeddings_llm_class) - Ensure AIGW is running and accessible
- Checkout
1866-add-embeddings-endpointbranch on AIGW
Test 1: Basic batch processing with batch_size
model_definition = Gitlab::Llm::Embeddings::ModelDefinition.for_gitlab_provided_code_embeddings(
identifier: 'text_embedding_005_vertex',
use_cloud_aigw: false
)
# Test with a batch size of 2
embedder = Gitlab::Llm::Embeddings::CodeEmbeddings.new(
['content a', 'content b', 'content c', 'content d', 'content e'],
unit_primitive: 'generate_embeddings_codebase',
user: nil,
model_definition: model_definition,
batch_size: 2
)
result = embedder.execute
Expected behavior:
- Should see 3 logs in
gdk tail gitlab-ai-gateway | grep instances(batches of 2, 2, 1) - Should get back 5 embeddings in the result
Test 2: Recursive batch splitting on token limit exceeded
Apply this patch to AIGW to raise an exception:
diff --git a/ai_gateway/api/v1/embeddings/code_embeddings.py b/ai_gateway/api/v1/embeddings/code_embeddings.py
--- a/ai_gateway/api/v1/embeddings/code_embeddings.py
+++ b/ai_gateway/api/v1/embeddings/code_embeddings.py
@@ -36,6 +36,12 @@
_validate_model_metadata_payload(payload.model_metadata)
+ if len(payload.contents) > 1:
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST,
+ detail="the input token count is 9999 but the model supports up to 2048",
+ )
+
prompt = prompt_registry.get_on_behalf(
user=current_user,
prompt_id=CODE_EMBEDDINGS_PROMPT_ID,
model_definition = Gitlab::Llm::Embeddings::ModelDefinition.for_gitlab_provided_code_embeddings(
identifier: 'text_embedding_005_vertex',
use_cloud_aigw: false
)
# Test without explicit batch_size (sends all at once)
embedder = Gitlab::Llm::Embeddings::CodeEmbeddings.new(
['content a', 'content b', 'content c', 'content d', 'content e'],
unit_primitive: 'generate_embeddings_codebase',
user: nil,
model_definition: model_definition
)
result = embedder.execute
Expected behavior:
- The batch is automatically split in half and retried
- Should see 5 logs in
gdk tail gitlab-ai-gateway | grep instances(recursive splitting) - Should get back 5 embeddings in the result
Test 3: Verify feature flag controls behavior
# With FF disabled
Feature.disable(:use_code_embeddings_llm_class)
# Should use Ai::ActiveContext::Embeddings::Code::VertexText
# With FF enabled
Feature.enable(:use_code_embeddings_llm_class)
# Should use Gitlab::Llm::Embeddings::CodeEmbeddings
Test 4: Cloud AIGW URL selection
model_definition = Gitlab::Llm::Embeddings::ModelDefinition.for_gitlab_provided_code_embeddings(
identifier: 'text_embedding_005_vertex',
use_cloud_aigw: true
)
model_definition.aigw_base_url
=> "https://cloud.staging.gitlab.com/ai"
embedder = Gitlab::Llm::Embeddings::CodeEmbeddings.new(
['content a', 'content b'],
unit_primitive: 'generate_embeddings_codebase',
user: nil,
model_definition: model_definition,
batch_size: 2
)
result = embedder.execute
Expected behavior:
- Requests are sent to the cloud connector AIGW URL (not the local
gdk tail gitlab-ai-gateway) - No logs will appear in local AIGW logs since it's using the cloud connector endpoint
Test 5: End to end
Run this with and without the FF enabled:
# Track a ref and execute the queue
Ai::ActiveContext::Collections::Code.track!({ id: '...', routing: ... })
ActiveContext.execute_all_queues!
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Edited by Madelein van Niekerk