Run evaluation in merge request pipelines when Duo feature is changed
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
We currently manually run evaluations in Prompt Library or ELI5 projects. This is not practical because:
- Developers have to setup and run an evaluation pipeline in their local environment everytime they change a Duo feature.
- Developers can merge a change without running evaluations.
Target changes
- Model
- Prompt
- Input Parser
- Output Parser
Proposal
- Run the evaluation pipeline in merge request pipelines when Duo feature is changed. This includes both GitLab-Rails and AI Gateway.
- Fail the job when it detects a quality degredation, which effectively prevents developers from merging it.
- Introduce a new label to skip the job in emergency. (like pipeline:skip-undercoverage)
Edited by 🤖 GitLab Bot 🤖