Specify each metric separately in the config (!276) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library

Hongtao Yang requested to merge hyang/separate-metrics into main Feb 16, 2024

What does this merge request do and why?

Right now when we run duo-chat eval pipeline, we always output 2 tables, one for evaluate, the other for compare. It would be nice to specify which metric(s) we want in the config.

This will be helpful in several cases. For example, when one of the table failed, we can just rerun that metric alone. This will be even more important when we add a third metric (!238 (merged) ).

🍎 Breaking changes

This MR made the following config changes that would break existing configurations.

eval_setup is renamed to eval_setups
evaluating_models is moved under a new field named metrics. One only need to specify this field when the metric is independent_llm_judge. This filed will be ignored when the metric is "similarity score".

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

I've ran the affected pipeline(s) to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Feb 20, 2024 by Hongtao Yang

Specify each metric separately in the config

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports