Ensure LLM-as-judge prompt are using the `answer` and `result` correctly (!47) · Merge requests · GitLab.org / AI Powered / ELI5

Pam Artiaga requested to merge pam/5-improve-llm-as-judge-prompt into main Jul 01, 2024

What does this merge request do and why?

The prompt in the LLM-as-judge evaluator had answer and result swapped. answer should refer to the expected answer, while result is the result that is going to be evaluated.

See docs here:

https://docs.smith.langchain.com/old/evaluation/faq/evaluator-implementations#correctness-qa-evaluation
https://docs.smith.langchain.com/tutorials/Developers/evaluation#define-metrics

How to set up and validate locally

Example command:

poetry run eli5 code-suggestions evaluate \         
  --dataset="code-suggestions-input-testcases-v1" \
  --source=gitlab \
  --limit=50 \
  --offset=0 \
  --evaluate-with-llm \
  --experiment-prefix=llm-test-1

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Jul 01, 2024 by Pam Artiaga

Ensure LLM-as-judge prompt are using the `answer` and `result` correctly

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports