Use a non-prod environment for evaluating GitLab AI features

Problem to solve

We are currently running evaluations for GitLab AI features against production env. With an increasing number of evaluations, this setup poses a number of challenges/risks.

The current rate limit in GraphQL slows down evaluation pipelines.
We risk overloading and cause incident in production. This includes overloading the GitLab worker fleets and hitting the third-party API limit.
Evaluation requests might obscure product adoption metrics as they need to be excluded from real user usages.
Whitelisting user accounts from rate limit requires security/infra approval. This slows down development if we need to provision new testing accounts.

Impacts to AI evaluation road map.

Slow down Duo Chat migration to GraphQL ( gitlab-org/gitlab#466662).
Slow down execution of Root Cause Analysis (RCA).
Slow down execution of Explain This Vulnerability (ETV).

Proposal 💡

Move evaluations for GitLab AI features to the staging environment.

graph LR
    A[Move to Staging] --> B[Whitelist Test Accounts]
    A --> C[Setup Third-party Integrations]
    A --> D[Seed Test Data]
    
    B --> E[Increased Rate Limits]
    C --> F[AI Gateway]
    C --> G[API Integrations]
    D --> H[Job Traces for RCA]
    D --> I[Vulnerability Reports for ETV]

Whitelist request limits for testing accounts on staging.
Ensure third-party AI Gateway and API integrations are set up correctly.
Seed test data as required (e.g., job traces for RCA and vulnerability reports for ETV).

Technical Implementation Plan 🛠️

1. Ensure Third-party AI Gateway and API Integrations

Set up Vertex AI and Anthropic integrations.
Configure accounts with appropriate limits to facilitate evaluation (if required).
Set up Ultimate license and Duo Pro add-ons.

2. Seed Evaluation Data

Focus on Root Cause Analysis (RCA) and Explain This Vulnerability (ETV) initially, with plans to expand to other existing (Duo Chat) and upcoming AI features.

graph LR
    A[Seed Evaluation Data] --> B[Set Up Test Projects]
    A --> C[Seed CI Log Traces for RCA]
    A --> D[Seed Vulnerability Reports for ETV]
    A --> E[Plan for Expansion]
    
    E --> F[Duo Chat]
    E --> G[Upcoming AI Features]

Set up test projects - groupai model validation.
Seed CI log traces for RCA - grouppipeline execution #356 (closed).
Seed Vulnerability reports for ETV - groupthreat insights https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/work_items/378.

Plan for manual seeding of data initially, then move to a more automated approach.

3. Setup Test Accounts with Increased Rate Limit

Make the rate-limit in ApplicationRateLimiter configurable groupai framework gitlab-org/gitlab!149945 (merged).
Update aiAction rate limit to 5000 requests / 8 hours https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/issues/25606.
Set up multiple service accounts to scale out to more evaluation pipelines.

Edited Aug 08, 2024 by Mon Ray