Skip to content

Use Vertex AI proxy endpoints in VertexAI::Client

NOTE: This is a high-priority MR for the deadline.

What does this merge request do and why?

This MR uses AI Gateway's Vertex AI proxy endpoints in VertexAI::Client. See AI Gateway ADR 002: Exposing proxy endpoints to AI providers for the overview of the changes.

This change is behind use_ai_gateway_proxy feature flag, which is disabled by default.

The main goal of these endpoints is to enable the independent AI features in self-managed instances within the proposed timeline. See the issue and this issue for more information.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Test VertexAi::Client from the Rails console:

[6] pry(main)> Gitlab::Llm::VertexAi::Client.new(User.first, unit_primitive: 'explain_vulnerability').chat(content: "Hi, how are you?")
  User Load (0.4ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT 1
=> {"predictions"=>
  [{"safetyAttributes"=>[{"categories"=>[], "scores"=>[], "blocked"=>false}],
    "citationMetadata"=>[{"citations"=>[]}],
    "groundingMetadata"=>[{}],
    "candidates"=>[{"content"=>" I'm doing great, thanks for asking! How can I help you today?", "author"=>"1"}]}],
 "metadata"=>{"tokenMetadata"=>{"outputTokenCount"=>{"totalBillableCharacters"=>50, "totalTokens"=>17}, "inputTokenCount"=>{"totalBillableCharacters"=>13, "totalTokens"=>6}}}}
AI Gateway log

Access request log:

{
    "url": "http://localhost/v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/codechat-bison:predict",
    "path": "/v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/codechat-bison%3Apredict",
    "status_code": 200,
    "method": "POST",
    "correlation_id": "34c3281ddd5e013305244f94c24d3166",
    "http_version": "1.1",
    "client_ip": "127.0.0.1",
    "client_port": 45488,
    "duration_s": 4.9186735000002955,
    "duration_request": -1,
    "cpu_s": 0.06469550999999996,
    "user_agent": "Ruby",
    "gitlab_instance_id": "ba75b213-4fd4-4311-8631-0ac7a1bd3247",
    "gitlab_global_user_id": "Cv4L37An7TsFzTjzy4yCixBZwUsK8+TCQYl7EYHVN8c=",
    "gitlab_host_name": "gdk.test",
    "gitlab_saas_duo_pro_namespace_ids": null,
    "gitlab_saas_namespace_ids": null,
    "gitlab_realm": "saas",
    "auth_duration_s": 0.7781890819969703,
    "meta.feature_category": "vulnerability_management",
    "meta.unit_primitive": "explain_vulnerability",
    "logger": "api.access",
    "level": "info",
    "type": "mlops",
    "stage": "main",
    "timestamp": "2024-05-16T06:44:23.841030Z",
    "message": "127.0.0.1:45488 - \"POST /v1/proxy/vertex-ai/v1/projects/PROJECT/locations/LOCATION/publishers/google/models/codechat-bison%3Apredict HTTP/1.1\" 200"
}

Proxy request log:

{
    "correlation_id": "34c3281ddd5e013305244f94c24d3166",
    "logger": "httpx",
    "level": "info",
    "type": "mlops",
    "stage": "main",
    "timestamp": "2024-05-16T06:44:23.838178Z",
    "message": "HTTP Request: POST https://us-central1-aiplatform.googleapis.com/v1/projects/ai-enablement-dev-69497ba7/locations/us-central1/publishers/google/models/codechat-bison:predict \"HTTP/1.1 200 OK\""
}

How to set up and validate locally

  1. Checkout feat: use OR expression for the required scope ... (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!814 - merged) in AI Gateway.
  2. Checkout this MR in GitLab-Rails.
  3. Enable the feature flag ::Feature.enable(:use_ai_gateway_proxy).
  4. Follow Optional: Test with OIDC authentication section.

Further reading

Edited by Shinya Maeda

Merge request reports