Build a dashboard with evaluations against Staging Ref
Overview
We're working on enabling AI features powered by self-hosted models on the Staging Ref environment: Functional Testing: Configure self-hosted model... (#497784 - closed).
It would be great to run evaluations against https://staging-ref.gitlab.com similar to the ones that we have for https://staging.gitlab.com and get a nice dashboard like: https://lookerstudio.google.com/u/0/reporting/151b233a-d6ad-413a-9ebf-ea6efbf5387b/page/p_dt1c6y4xed.
Proposal
I thought that we can re-use the existing configuration: feat(gitlab-models): make GITLAB_BASE_URL confi... (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library!794 - closed), but was advised to keep things separately.
Let's configure a daily run pipeline: https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/configure_daily_runs.md?ref_type=heads and create a dashboard based on the results.