Run evaluation in merge request pipelines when Duo feature is changed
Problem
We currently manually run evaluations in Prompt Library or ELI5 projects. This is not practical because:
- Developers have to setup and run an evaluation pipeline in their local environment everytime they change a Duo feature.
- Developers can merge a change without running evaluations.
Target changes
- Model
- Prompt
- Input Parser
- Output Parser
Proposal
- Run the evaluation pipeline in merge request pipelines when Duo feature is changed. This includes both GitLab-Rails and AI Gateway.
- Fail the job when it detects a quality degredation, which effectively prevents developers from merging it.
- Introduce a new label to skip the job in emergency. (like pipeline:skip-undercoverage)
Edited by Shinya Maeda