Use distinct endpoint for generating search target embeddings

What does this MR do and why?

We introduced a distinct AIGW endpoint for generating search target embeddings in feat: add embeddings endpoint for search (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!5046 - merged).

We use the new endpoint in this MR.

References

Screenshots or screen recordings

N/A - see validation steps

How to set up and validate locally

1 - Setup

  1. Make sure that your local AIGW already has the changes from feat: add embeddings endpoint for search (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!5046 - merged)

  2. Enable the Feature Flag:

    Feature.enable(:use_distinct_search_embeddings_endpoint)
  • Generate embeddings through the search_embedding_model

    # run the following on the rails console:
    ::Ai::ActiveContext::Collections::Code.search_embedding_model.generate_embeddings("test", user: User.first, unit_primitive: 'generate_embeddings_codebase')
  • OR test the semantic_code_search MCP tool through the MCP Inspector or anything similar

    Expand for screenshot

    Screenshot_2026-04-01_at_10.25.59

3 - Verify

  • Check Rails logs that the embeddings request is being sent to the search embeddings endpoint: /v1/embeddings/code_embeddings/search

    # on the `gitlab-org/gitlab` directory
    > tail -f log/llm.log
    
    # you should get the following (check the `url` and `ai_component` fields):
    {"severity":"INFO","time":"2026-03-27T04:19:30.293Z","unit_primitive":"generate_embeddings_codebase",
     "url":"http://gdk.test:5052/v1/embeddings/code_embeddings/search",
     "params":"{\"model_metadata\":{\"provider\":\"gitlab\",\"identifier\":\"text_embedding_005_vertex\"}}","message":"Performing embeddings request","class":"Gitlab::Llm::Embeddings::Client","ai_event_name":"performing_request",
     "ai_component":"code_embeddings_search"
    }
    {"severity":"INFO","time":"2026-03-27T04:19:31.145Z","correlation_id":"17e7008768c62e0a1139eb1ac3c00f69","unit_primitive":"generate_embeddings_codebase",
     "url":"http://gdk.test:5052/v1/embeddings/code_embeddings/search",
     "message":"Received embeddings response","class":"Gitlab::Llm::Embeddings::Client","ai_event_name":"response_received",
     "ai_component":"code_embeddings_search"
    }
  • OR Check the AI Gateway debug log

    # on the GDK directory
    tail -f log/gitlab-ai-gateway/gateway_debug.log
    
    ### this will log a lot of information, but one of the last lines should have the invoked endpoint
    2026-03-31T23:21:27.689020Z [info     ] 172.16.123.1:52757 - "POST /v1/embeddings/code_embeddings/search HTTP/1.1" 200 [api.access] client_ip=172.16.123.1 client_port=52757 content_type=application/json correlation_id=01KN337AQRH15WSRSZXPV4TYMV cpu_s=0.15320800000000023 duration_request=0.009139060974121094 duration_s=3.008309458993608 enabled-instance-verbose-ai-logs=False enabled_feature_flags= first_chunk_duration_s=3.00822208399768 gitlab_feature_enabled_by_namespace_ids= gitlab_feature_enablement_type= gitlab_global_user_id='cBvp8+rYMVJ482CJgnCet9uH02adHDGS0hJdKJ8yXEw=' gitlab_host_name=gdk.test gitlab_instance_id=7b8ebdcb-fa06-44fe-bf65-2e9dfafe670a gitlab_language_server_version=None gitlab_realm=self-managed gitlab_root_namespace_id=None gitlab_saas_duo_pro_namespace_ids=None gitlab_version=18.11.0 http_version=1.1 is_gitlab_team_member=false meta.feature_category=global_search method=POST path=/v1/embeddings/code_embeddings/search request_arrived_at=2026-03-31T23:21:24.680605+00:00 response_start_duration_s=3.0081596670206636 stage=main status_code=200 type=mlops url=http://gdk.test:5052/v1/embeddings/code_embeddings/search user_agent=Ruby

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #594971 (closed)

Edited by Pam Artiaga

Merge request reports

Loading