Skip to content

Add LLM-judge binary metric

Bruno Cardoso requested to merge bc/add-binary-metric into main

What does this merge request do and why?

This adds the same binary metric (CORRECT/INCORRECT) used on the monolith https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/1.

Example output: image

It also works when specifying other metrics:

image

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Ref. #167 (closed)

Merge request checklist

  • I've ran the affected pipeline(s) to validate that nothing is broken.
  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Bruno Cardoso

Merge request reports