feat: make pairwise evaluation feature-specific with pre-defined evaluators (!152) · Merge requests · GitLab.org / AI Powered / ELI5 · GitLab

This is an archived project. Repository and other project resources are read-only.

Alexander Chueshev requested to merge ac/generic-pairwise-eval into main Sep 24, 2024

What does this merge request do and why?

This MR introduces a set of generic/base classes to build feature-specific pairwise evaluation pipelines. Please, note that we cannot keep one generic pairwise evaluation due to different dataset schemas and DRIs - https://gitlab.com/gitlab-com/content-sites/internal-handbook/-/merge_requests/5388 (internal only).

As an example, this MR demonstrates how to build pairwise evaluation for Duo Chat and available datasets.

How to set up and validate locally

Check out to this merge request's branch.
Update your .env file.
Install dependencies.
```
poetry install
```
Check the existing commands ELI5 provides:
```
poetry run eli5 --help
```

Run pairwise evaluation for Duo Chat documentation-related dataset:

poetry run eli5 duo-chat evaluate pairwise c1fe0d17-32eb-4697-a5c9-0d5dbb1eb20c b6af3206-9807-4754-ac31-2deb43a1a320 --dataset=duo_chat.cot_qa_docs.1

Run pairwise evaluation for Duo Chat issue/epic-related dataset:

poetry run eli5 duo-chat evaluate pairwise b26f592e-1398-4284-ad03-81c486e32bfc 5a67fedf-bd26-42fc-89f4-4c8875ad0f28 --dataset=duo_chat.cot_qa_resources.1

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.