Migrate Vertex AI Proxy to new Tokens path

What does this MR do and why?

Prepare migrating Vertex AI Proxy to new Tokens path: use CloudConnector::Tokens.

More information about UPs used with this endpoint: #554541 (closed).

Impact

This change is expected to be no-op.
The only Unit Primitive that is currently used with Vertex AI proxy endpoint is semantic_search_issue (see the thread below).
It should not affect semantic_search_issue either, but we can validate this change for this UP.

Rollout

⚠️ Note that I am not including Vertex-related UPs into ROLLED_OUT_UNIT_PRIMITIVES in this MR and I also do not introduce FFs in this MR.
That means that this MR will continue using the old token flow for now - the only technical difference is that we now pass a service with the corresponding UP name, not generic vertex_ai_proxy) into Gitlab::AiGateway.headers call. Note that we have ai_feature_name: arg now which supplied with vertex_ai_proxy to search through the catalog - https://gitlab.com/gitlab-org/gitlab/-/blob/37088bda8311f33aec98fedd6adb5cdf2e6b716d/ee/lib/gitlab/llm/utils/ai_features_catalogue.rb#L175).

Our plan is to roll out code_suggestions_new_tokens_path and anthropic_proxy_new_tokens_path and then drop the fallback mechanism entirely, so all new UPs will be rolled out automatically - that will include Vertex-related UPs and all services migrated to the new Token path in the future.

How to set up and validate locally

We want to verify that the endpoint with semantic_search_issue works as before.

To do that, I followed the instructions there: https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/development/ai_features/embeddings.md?plain=0&ref_type=heads#adding-work-item-embeddings-locally

On a final step, running Gitlab::Llm::VertexAi::Embeddings::Text.new('text', user: nil, tracking_context: {}, unit_primitive: 'semantic_search_issue').execute was successful (I got a vector), and I noticed related logs in my AI GW logs:

2025-08-22 12:22:47 [info     ] Request to LLM complete        correlation_id=6b3377011d099b5ac1552cd8ebc1e055 duration=0.8583916669886094 source=ai_gateway.instrumentators.model_requests
2025-08-22 12:22:47 [info     ] 127.0.0.1:52305 - "POST /v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/text-embedding-005%3Apredict HTTP/1.1" 200 auth_duration_s=0.19122983299894258 client_ip=127.0.0.1 client_port=52305 content_type='application/json; charset=UTF-8' correlation_id=6b3377011d099b5ac1552cd8ebc1e055 cpu_s=0.15309600000000145 duration_request=0.01607990264892578 duration_s=1.8154523749981308 enabled-instance-verbose-ai-logs=False enabled_feature_flags= first_chunk_duration_s=1.815417082994827 gitlab_feature_enabled_by_namespace_ids= gitlab_feature_enablement_type= gitlab_global_user_id=None gitlab_host_name=gdk.test gitlab_instance_id=703670d1-5ec7-4d88-a70a-f442840c0404 gitlab_language_server_version=None gitlab_realm=self-managed gitlab_saas_duo_pro_namespace_ids=None gitlab_version=18.4.0 http_version=1.1 meta.feature_category=global_search meta.unit_primitive=semantic_search_issue method=POST path=/v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/text-embedding-005%3Apredict request_arrived_at=2025-08-22T10:22:45.555717+00:00 response_start_duration_s=1.8153415830020094 status_code=200 token_issuer=http://localhost:5000/ url=http://0.0.0.0/v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/text-embedding-005:predict user_agent=Ruby

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #561288 (closed)

Edited by Aleksei Lipniagov

Merge request reports

Loading