feat: poc for functional correctness with mbpp
What does this merge request do and why?
Enables usage of functional evaluator for the mbpp dataset.
How to set up and validate locally
poetry run eli5 code-suggestions evaluate \
--dataset="mbpp_functional_small" \
--evaluators="py-functional-correctness" \
--intent=completion \
--source="ai-gateway" \
--model-name="codestral" \
--model-provider="litellm" \
--model-endpoint="http://localhost:4000" \
--no-upload \
--limit=1
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Closes #32 (closed)
Edited by Eduardo Bonet