Build a dashboard with evaluations against Staging Ref

Overview

We're working on enabling AI features powered by self-hosted models on the Staging Ref environment: Functional Testing: Configure self-hosted model... (#497784 - closed).

It would be great to run evaluations against https://staging-ref.gitlab.com similar to the ones that we have for https://staging.gitlab.com and get a nice dashboard like: https://lookerstudio.google.com/u/0/reporting/151b233a-d6ad-413a-9ebf-ea6efbf5387b/page/p_dt1c6y4xed.

Proposal

I thought that we can re-use the existing configuration: feat(gitlab-models): make GITLAB_BASE_URL confi... (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library!794 - closed), but was advised to keep things separately.

Let's configure a daily run pipeline: https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/configure_daily_runs.md?ref_type=heads and create a dashboard based on the results.

Edited Oct 09, 2024 by Igor Drozdov