feat: integrate eli5 and add eval command

What does this merge request do and why?

Closes gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library#622 (closed)

How to set up and validate locally

  1. Add the following to your .env:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_API_KEY=[my-api-key]
  1. Create a dataset in your langchain account from this jsonl (see docs at https://docs.smith.langchain.com/old/evaluation/faq/manage-datasets#upload-a-csv) and call it gen-desc-ds

  2. Run an evaluation for a given prompt+version against an existing dataset:

poetry run python eval generate_description 1.0.0 gen-desc-ds

The output should include a link to see the evaluation results. Open it and verify you get stats for each example in the dataset.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Alejandro Rodríguez

Merge request reports

Loading