Use distinct endpoint for generating search target embeddings
What does this MR do and why?
We introduced a distinct AIGW endpoint for generating search target embeddings in feat: add embeddings endpoint for search (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!5046 - merged).
We use the new endpoint in this MR.
References
- Issue: #594971 (closed)
- Discussion: https://gitlab.com/groups/gitlab-org/-/work_items/21437#note_3196131142
Screenshots or screen recordings
N/A - see validation steps
How to set up and validate locally
1 - Setup
-
Make sure that your local AIGW already has the changes from feat: add embeddings endpoint for search (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!5046 - merged)
-
Enable the Feature Flag:
Feature.enable(:use_distinct_search_embeddings_endpoint)
2 - Run search
-
Generate embeddings through the
search_embedding_model# run the following on the rails console: ::Ai::ActiveContext::Collections::Code.search_embedding_model.generate_embeddings("test", user: User.first, unit_primitive: 'generate_embeddings_codebase') -
OR test the
semantic_code_searchMCP tool through the MCP Inspector or anything similar
3 - Verify
-
Check Rails logs that the embeddings request is being sent to the search embeddings endpoint:
/v1/embeddings/code_embeddings/search# on the `gitlab-org/gitlab` directory > tail -f log/llm.log # you should get the following (check the `url` and `ai_component` fields): {"severity":"INFO","time":"2026-03-27T04:19:30.293Z","unit_primitive":"generate_embeddings_codebase", "url":"http://gdk.test:5052/v1/embeddings/code_embeddings/search", "params":"{\"model_metadata\":{\"provider\":\"gitlab\",\"identifier\":\"text_embedding_005_vertex\"}}","message":"Performing embeddings request","class":"Gitlab::Llm::Embeddings::Client","ai_event_name":"performing_request", "ai_component":"code_embeddings_search" } {"severity":"INFO","time":"2026-03-27T04:19:31.145Z","correlation_id":"17e7008768c62e0a1139eb1ac3c00f69","unit_primitive":"generate_embeddings_codebase", "url":"http://gdk.test:5052/v1/embeddings/code_embeddings/search", "message":"Received embeddings response","class":"Gitlab::Llm::Embeddings::Client","ai_event_name":"response_received", "ai_component":"code_embeddings_search" } -
OR Check the AI Gateway debug log
# on the GDK directory tail -f log/gitlab-ai-gateway/gateway_debug.log ### this will log a lot of information, but one of the last lines should have the invoked endpoint 2026-03-31T23:21:27.689020Z [info ] 172.16.123.1:52757 - "POST /v1/embeddings/code_embeddings/search HTTP/1.1" 200 [api.access] client_ip=172.16.123.1 client_port=52757 content_type=application/json correlation_id=01KN337AQRH15WSRSZXPV4TYMV cpu_s=0.15320800000000023 duration_request=0.009139060974121094 duration_s=3.008309458993608 enabled-instance-verbose-ai-logs=False enabled_feature_flags= first_chunk_duration_s=3.00822208399768 gitlab_feature_enabled_by_namespace_ids= gitlab_feature_enablement_type= gitlab_global_user_id='cBvp8+rYMVJ482CJgnCet9uH02adHDGS0hJdKJ8yXEw=' gitlab_host_name=gdk.test gitlab_instance_id=7b8ebdcb-fa06-44fe-bf65-2e9dfafe670a gitlab_language_server_version=None gitlab_realm=self-managed gitlab_root_namespace_id=None gitlab_saas_duo_pro_namespace_ids=None gitlab_version=18.11.0 http_version=1.1 is_gitlab_team_member=false meta.feature_category=global_search method=POST path=/v1/embeddings/code_embeddings/search request_arrived_at=2026-03-31T23:21:24.680605+00:00 response_start_duration_s=3.0081596670206636 stage=main status_code=200 type=mlops url=http://gdk.test:5052/v1/embeddings/code_embeddings/search user_agent=Ruby
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #594971 (closed)
