Ensure LLM-as-judge prompt are using the `answer` and `result` correctly
What does this merge request do and why?
The prompt in the LLM-as-judge evaluator had answer
and result
swapped. answer
should refer to the expected answer, while result
is the result that is going to be evaluated.
See docs here:
- https://docs.smith.langchain.com/old/evaluation/faq/evaluator-implementations#correctness-qa-evaluation
- https://docs.smith.langchain.com/tutorials/Developers/evaluation#define-metrics
How to set up and validate locally
Example command:
poetry run eli5 code-suggestions evaluate \
--dataset="code-suggestions-input-testcases-v1" \
--source=gitlab \
--limit=50 \
--offset=0 \
--evaluate-with-llm \
--experiment-prefix=llm-test-1
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Edited by Pam Artiaga