Version 0.20.0 Features - Add AWS Bedrock testing - Add eval for /refactor with readability judge - Add local file input source to the ETV pipeline - Add docs for /refactor evals - Allow to specify concurrency to the gitlab-docs eval - Allow Regression evaluator for Duo Chat to expand GITLAB_BASE_URL variable in dataset Fixes - Fix missing boto3 dependencies - Remove unnecessary show_default flag - Do not auto stub environment variables - Quick fix: Increase max workers for Duo Workflow SWE Benchmark - Trouble shoot setup.sh - Remove the missing README.md from eli5/Dockerfile to fix the build Internal - Add validation for similarity score - Add core evaluator to monitor tool trajectory - Optimize SWE Benchmark to skip examples with already calculated metrics - Introduce rate_limit and rate_limit_period to the Duo Workflow SWE pipeline - Multiple dependency updates (pydantic, pytest, langchain, etc.) - Code ownership updates (Added David and bcardoso-) - Documentation improvements (context evals, ELI5 troubleshooting) - Configuration updates (Python version alignment, dataset configs) - CI/CD improvements (Use CI_COMMIT_REF_SLUG for building eli image) - Code cleanup and refactoring - Move ELI5 README to main README - Support dataset re-registration for different owners - Basic notebook for local evaluation of VR - Format ELI5 setup script