Add LLM-judge binary metric (!287) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library · GitLab

Bruno Cardoso requested to merge bc/add-binary-metric into main Feb 20, 2024

What does this merge request do and why?

This adds the same binary metric (CORRECT/INCORRECT) used on the monolith https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/1.

Example output:

It also works when specifying other metrics:

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Ref. #167 (closed)

Merge request checklist

I've ran the affected pipeline(s) to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Feb 20, 2024 by Bruno Cardoso