-
Add script to fetch latest metrics from daily run 2 of 3 checklist items completed
- Merged
-
-
- 18
- Approved
updated -
Support local input to avoid calling GitLab API 2 of 3 checklist items completed
- Merged
-
-
- 27
- Approved
updated -
Add support to evaluate with Claude 3 Haiku 1 of 3 checklist items completed
- Merged
-
-
- 1
- Approved
updated -
Add Gemini Pro 1.5 model support 1 of 3 checklist items completed
- Merged
-
-
- 7
- Approved
updated -
Add claude-3 1 of 3 checklist items completed
- Merged
-
-
- 8
- Approved
updated -
Add model handler from Gemini PRO 1 of 3 checklist items completed
- Merged
-
-
- 2
- Approved
updated -
Support local output of Chat evaluation results 3 of 3 checklist items completed
- Merged
-
-
- 22
- Approved
updated -
Added ability to compare with ground truth 0 of 3 checklist items completed
- Merged
-
-
- 1
- Approved
updated -
Added dry run and test run for easy debugging 1 of 3 checklist items completed
- Merged
-
-
- 10
- Approved
updated -
Collective LLM Judge 0 of 3 checklist items completed
- Merged
-
-
- 41
- Approved
updated -
Add option to change the BigQuery write dispositon 3 of 3 checklist items completed
- Merged
-
-
- 2
- Approved
updated -
Add pipeline start timestamp 1 of 3 checklist items completed
- Merged
-
-
- 3
- Approved
updated -
Add pipeline to run Duo Chat evaluation locally without using GitLab API 2 of 3 checklist items completed
- Merged
-
- 3
updated -
Specify each metric separately in the config 1 of 3 checklist items completed
- Merged
-
-
- 14
- 1
- Approved
updated -
Allow get_response to retry a few times 1 of 3 checklist items completed
- Merged
-
-
- 16
- Approved
updated -
Added text-bison-32k model 0 of 3 checklist items completed
- Merged
-
-
- Approved
updated -
Clean chat history before each question 2 of 3 checklist items completed
- Merged
-
-
- 14
- Approved
updated -
Add model similarity pipeline to duo-chat eval 1 of 3 checklist items completed
- Merged
-
-
- 16
- Approved
updated -
Resolve "Adding Code Generation Open Source Datasets to Prompt Library for Chat Eval" 1 of 3 checklist items completed
- Merged
-
-
- 20
updated -
Build client and runner images in CI 0 of 3 checklist items completed
- Merged
-
-
- 1
- Approved
updated