Introduce generic embeddings llm class

What does this MR do and why?

Introduce generic embeddings LLM class Gitlab::Llm::Embeddings::CodeEmbeddings for sending embeddings requests to AIGW's new /v1/embeddings/code_embeddings endpoint.

This adds:

  • Gitlab::Llm::Embeddings::CodeEmbeddings - main LLM class that handles batch processing with recursive splitting on token limit errors
  • Gitlab::Llm::Embeddings::Client - HTTP client for communicating with AIGW
  • Gitlab::Llm::Embeddings::ModelDefinition - configuration for model parameters and AIGW URL selection
  • Gitlab::Llm::Embeddings::Response - response wrapper for parsing embeddings from AIGW

The new class replaces Ai::ActiveContext::Embeddings::Code::VertexText and is controlled by the use_code_embeddings_llm_class feature flag.

References

Related to #590572 (closed)

How to set up and validate locally

Prerequisites

  1. Enable the feature flag: Feature.enable(:use_code_embeddings_llm_class)
  2. Ensure AIGW is running and accessible
  3. Checkout 1866-add-embeddings-endpoint branch on AIGW

Test 1: Basic batch processing with batch_size

model_definition = Gitlab::Llm::Embeddings::ModelDefinition.for_gitlab_provided_code_embeddings(
  identifier: 'text_embedding_005_vertex',
  use_cloud_aigw: false
)

# Test with a batch size of 2
embedder = Gitlab::Llm::Embeddings::CodeEmbeddings.new(
  ['content a', 'content b', 'content c', 'content d', 'content e'],
  unit_primitive: 'generate_embeddings_codebase',
  user: nil,
  model_definition: model_definition,
  batch_size: 2
)

result = embedder.execute

Expected behavior:

  • Should see 3 logs in gdk tail gitlab-ai-gateway | grep instances (batches of 2, 2, 1)
  • Should get back 5 embeddings in the result

Test 2: Recursive batch splitting on token limit exceeded

Apply this patch to AIGW to raise an exception:

diff --git a/ai_gateway/api/v1/embeddings/code_embeddings.py b/ai_gateway/api/v1/embeddings/code_embeddings.py
--- a/ai_gateway/api/v1/embeddings/code_embeddings.py
+++ b/ai_gateway/api/v1/embeddings/code_embeddings.py
@@ -36,6 +36,12 @@
 
     _validate_model_metadata_payload(payload.model_metadata)
 
+    if len(payload.contents) > 1:
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail="the input token count is 9999 but the model supports up to 2048",
+        )
+
     prompt = prompt_registry.get_on_behalf(
         user=current_user,
         prompt_id=CODE_EMBEDDINGS_PROMPT_ID,
model_definition = Gitlab::Llm::Embeddings::ModelDefinition.for_gitlab_provided_code_embeddings(
  identifier: 'text_embedding_005_vertex',
  use_cloud_aigw: false
)

# Test without explicit batch_size (sends all at once)
embedder = Gitlab::Llm::Embeddings::CodeEmbeddings.new(
  ['content a', 'content b', 'content c', 'content d', 'content e'],
  unit_primitive: 'generate_embeddings_codebase',
  user: nil,
  model_definition: model_definition
)

result = embedder.execute

Expected behavior:

  • The batch is automatically split in half and retried
  • Should see 5 logs in gdk tail gitlab-ai-gateway | grep instances (recursive splitting)
  • Should get back 5 embeddings in the result

Test 3: Verify feature flag controls behavior

# With FF disabled
Feature.disable(:use_code_embeddings_llm_class)
# Should use Ai::ActiveContext::Embeddings::Code::VertexText

# With FF enabled
Feature.enable(:use_code_embeddings_llm_class)
# Should use Gitlab::Llm::Embeddings::CodeEmbeddings

Test 4: Cloud AIGW URL selection

model_definition = Gitlab::Llm::Embeddings::ModelDefinition.for_gitlab_provided_code_embeddings(
  identifier: 'text_embedding_005_vertex',
  use_cloud_aigw: true
)

model_definition.aigw_base_url
=> "https://cloud.staging.gitlab.com/ai"

embedder = Gitlab::Llm::Embeddings::CodeEmbeddings.new(
  ['content a', 'content b'],
  unit_primitive: 'generate_embeddings_codebase',
  user: nil,
  model_definition: model_definition,
  batch_size: 2
)

result = embedder.execute

Expected behavior:

  • Requests are sent to the cloud connector AIGW URL (not the local gdk tail gitlab-ai-gateway)
  • No logs will appear in local AIGW logs since it's using the cloud connector endpoint

Test 5: End to end

Run this with and without the FF enabled:

# Track a ref and execute the queue
Ai::ActiveContext::Collections::Code.track!({ id: '...', routing: ... })
ActiveContext.execute_all_queues!

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Madelein van Niekerk

Merge request reports

Loading