Skip to content

feat: poc for functional correctness with mbpp

Eduardo Bonet requested to merge 32-implement-functional-correctness-for-mbpp into main

What does this merge request do and why?

Enables usage of functional evaluator for the mbpp dataset.

How to set up and validate locally

poetry run eli5 code-suggestions evaluate \
    --dataset="mbpp_functional_small" \
    --evaluators="py-functional-correctness" \
    --intent=completion \
    --source="ai-gateway" \
    --model-name="codestral" \
    --model-provider="litellm" \
    --model-endpoint="http://localhost:4000" \
    --no-upload \
    --limit=1

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Closes #32 (closed)

Edited by Eduardo Bonet

Merge request reports

Loading