Skip to content

Ensure LLM-as-judge prompt are using the `answer` and `result` correctly

Pam Artiaga requested to merge pam/5-improve-llm-as-judge-prompt into main

What does this merge request do and why?

The prompt in the LLM-as-judge evaluator had answer and result swapped. answer should refer to the expected answer, while result is the result that is going to be evaluated.

See docs here:

How to set up and validate locally

Example command:

poetry run eli5 code-suggestions evaluate \         
  --dataset="code-suggestions-input-testcases-v1" \
  --source=gitlab \
  --limit=50 \
  --offset=0 \
  --evaluate-with-llm \
  --experiment-prefix=llm-test-1

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Pam Artiaga

Merge request reports