Skip to content
Version 0.20.0

Features

- Add AWS Bedrock testing
- Add eval for /refactor with readability judge
- Add local file input source to the ETV pipeline
- Add docs for /refactor evals
- Allow to specify concurrency to the gitlab-docs eval
- Allow Regression evaluator for Duo Chat to expand GITLAB_BASE_URL variable in dataset

Fixes

- Fix missing boto3 dependencies
- Remove unnecessary show_default flag
- Do not auto stub environment variables
- Quick fix: Increase max workers for Duo Workflow SWE Benchmark
- Trouble shoot setup.sh
- Remove the missing README.md from eli5/Dockerfile to fix the build

Internal

- Add validation for similarity score
- Add core evaluator to monitor tool trajectory
- Optimize SWE Benchmark to skip examples with already calculated metrics
- Introduce rate_limit and rate_limit_period to the Duo Workflow SWE pipeline
- Multiple dependency updates (pydantic, pytest, langchain, etc.)
- Code ownership updates (Added David and bcardoso-)
- Documentation improvements (context evals, ELI5 troubleshooting)
- Configuration updates (Python version alignment, dataset configs)
- CI/CD improvements (Use CI_COMMIT_REF_SLUG for building eli image)
- Code cleanup and refactoring
- Move ELI5 README to main README
- Support dataset re-registration for different owners
- Basic notebook for local evaluation of VR
- Format ELI5 setup script