🚀 Migration Schedule for Daily Runs to Staging Environment

📋 Overview

This is an overview issue to track the Migration Schedule for Daily Runs as we transition to the staging environment. We capture other details such as accounts, error limits, and areas the team are investigating as well.

The schedule will include RCA, Chat, and Vulnerability Explanation. There was no daily run for Code Suggestion, but with staging, we will be adding that as well. We will have interim production runs as well till the full migration is completed. Details of the migration is here: #346 (closed)

🏗️ System Design with token usage for daily runs

Currently both Eval judges for Prod and Staging using Anthropic Eval Account
The red block are areas work is under troubleshooting
The Prod account uses the feature token usage and the details and estimation are in the below table.

graph LR
    DailyRuns["Daily Runs"]
    CEF

    subgraph Stg
        direction LR
        LS(*GraphQL Limit)
        CFS[Cloud Flare]
        LBS[Load Balancer]
        GLS[GitLab]
        GWS[AI Gateway]

        LS --> CFS --> LBS --> GLS --> GWS
    end

    subgraph Prod
        direction LR
        LP(*GraphQL Limit)
        CFP[Cloud Flare]
        LBP[Load Balancer]
        GLP[GitLab]
        GWP[AI Gateway]

        LP --> CFP --> LBP --> GLP --> GWP
    end

    subgraph Environment
        direction TB
        Stg
        Prod
    end
 
    CEF --> Stg
    CEF --> Prod
    DailyRuns["Daily Runs"] --> CEF
    
    Stg --> AE[Anthropic Eval]
    AE -->  RCAS[RCA]

    Prod --> AP[Anthropic Prod]
    AP -->  RCA[RCA]
    AP -->  ETV[ETV]
    AP -->  CS[Code Suggestion]
    AP -->  D[Duo Feature]
 
    CEF --> EJ[*Eval judge]
    EJ --> AE
    classDef ap fill:#74992e,stroke:#333,stroke-width:2px
    classDef ae fill:#4287f5,stroke:#333,stroke-width:2px
    classDef at fill:#f54242,stroke:#333,stroke-width:2px
    class AP ap
    class AE ae
    class LS,EJ at

🚧 Current Progress and limitations on Staging

For staging the current work is the red blocks in the above diagram

🛠️ DRI: infrastructure
- We have a GraphQL limit error as staging and production environments are different and would need Infra support on that. This could be in any of the components of staging, cloudflare , loadbalancer, GitLab
🧠 DRI: groupai model validation
- We are also working on the robustness of the judge that is giving null values. https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/issues/429
👥 Feature Teams:
- a) ✅ RCA seeding data is completed.
- b) 💬 We have the chat team working on seeding data.
- c) 🔒 We want to validate on Vulnerability seeding data as well.

📅 Interim Recommendation for production run schedule for features as we migrate to staging

Note: All production runs for Evaluator judges used Anthropic Eval accounts, and for feature inference, the production account.

Feature	Daily Run Production Schedule till Staging Migration	Feature Request and Token Usage	Task for Migration to Staging	Priority for Staging	Staging Migration Estimation Date
Root Cause Analysis	900 prompts/day ( Monday, Wednesday, Friday)	Max Request/Min: 20 requests/Min Token Usage: 35Ktoken X16BatchX2.5Batch/Min 1.4M/minute (17.5% of the Production limit) (Rough Calculation, feature teams not tracking)	Completed Dataseeding	priority1	Post GraphQL error support from Infra Estimated Date: Aug-15th
Duo Chat	Full dataset ( Tuesday, Thursday, Saturday)	Max Request/Min: 50 requests/Min Is it tracked by Feature team?	Dataseeding in Progress	priority2	Post Graph QL error support from Infra and seeding data Estimated Date: Aug - 17th
Vulnerability Explanation	On Hold till staging	Is it tracked by Feature Team?	Dataseeding in Review	priority2	Post Graph QL error support from Infra and seeding data review Estimated Date: Aug- 21
Vulnerability Resolves	On Hold till staging		Dataseeding in Review	priority2	Post Graph QL error support from Infra and seeding data review Estimated Date Aug-21
Code Suggestion	Build post migration to Staging		Not yet started	priority3	TBD

Edited Sep 30, 2024 by Tan Le