Migrate Vertex AI Proxy to new Tokens path (!202231) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Prepare migrating Vertex AI Proxy to new Tokens path: use CloudConnector::Tokens.

More information about UPs used with this endpoint: #554541 (closed).

Impact

This change is expected to be no-op.
The only Unit Primitive that is currently used with Vertex AI proxy endpoint is semantic_search_issue (see the thread below).
It should not affect semantic_search_issue either, but we can validate this change for this UP.

Rollout

⚠️ Note that I am not including Vertex-related UPs into ROLLED_OUT_UNIT_PRIMITIVES in this MR and I also do not introduce FFs in this MR.
That means that this MR will continue using the old token flow for now - the only technical difference is that we now pass a service with the corresponding UP name, not generic vertex_ai_proxy) into Gitlab::AiGateway.headers call. Note that we have ai_feature_name: arg now which supplied with vertex_ai_proxy to search through the catalog - https://gitlab.com/gitlab-org/gitlab/-/blob/37088bda8311f33aec98fedd6adb5cdf2e6b716d/ee/lib/gitlab/llm/utils/ai_features_catalogue.rb#L175).

Our plan is to roll out code_suggestions_new_tokens_path and anthropic_proxy_new_tokens_path and then drop the fallback mechanism entirely, so all new UPs will be rolled out automatically - that will include Vertex-related UPs and all services migrated to the new Token path in the future.

How to set up and validate locally

We want to verify that the endpoint with semantic_search_issue works as before.

To do that, I followed the instructions there: https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/development/ai_features/embeddings.md?plain=0&ref_type=heads#adding-work-item-embeddings-locally

On a final step, running Gitlab::Llm::VertexAi::Embeddings::Text.new('text', user: nil, tracking_context: {}, unit_primitive: 'semantic_search_issue').execute was successful (I got a vector), and I noticed related logs in my AI GW logs:

2025-08-22 12:22:47 [info     ] Request to LLM complete        correlation_id=6b3377011d099b5ac1552cd8ebc1e055 duration=0.8583916669886094 source=ai_gateway.instrumentators.model_requests
2025-08-22 12:22:47 [info     ] 127.0.0.1:52305 - "POST /v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/text-embedding-005%3Apredict HTTP/1.1" 200 auth_duration_s=0.19122983299894258 client_ip=127.0.0.1 client_port=52305 content_type='application/json; charset=UTF-8' correlation_id=6b3377011d099b5ac1552cd8ebc1e055 cpu_s=0.15309600000000145 duration_request=0.01607990264892578 duration_s=1.8154523749981308 enabled-instance-verbose-ai-logs=False enabled_feature_flags= first_chunk_duration_s=1.815417082994827 gitlab_feature_enabled_by_namespace_ids= gitlab_feature_enablement_type= gitlab_global_user_id=None gitlab_host_name=gdk.test gitlab_instance_id=703670d1-5ec7-4d88-a70a-f442840c0404 gitlab_language_server_version=None gitlab_realm=self-managed gitlab_saas_duo_pro_namespace_ids=None gitlab_version=18.4.0 http_version=1.1 meta.feature_category=global_search meta.unit_primitive=semantic_search_issue method=POST path=/v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/text-embedding-005%3Apredict request_arrived_at=2025-08-22T10:22:45.555717+00:00 response_start_duration_s=1.8153415830020094 status_code=200 token_issuer=http://localhost:5000/ url=http://0.0.0.0/v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/text-embedding-005:predict user_agent=Ruby

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #561288 (closed)

Edited Aug 25, 2025 by Aleksei Lipniagov

Migrate Vertex AI Proxy to new Tokens path

What does this MR do and why?

Impact

Rollout

How to set up and validate locally

MR acceptance checklist

Merge request reports