Create /tests evaluator and register evaluation command in CEF
Context
/tests
is a Duo Chat IDE slash command that generates unit tests for the user's selected code.
In &16634, our objective is to establish an evaluation process to help us assess and monitor the accuracy of creating tests with /tests
, particularly as we evaluate new models or new versions of models.
For this issue, we can utilize the dataset created in #515914 (closed) to help us craft the evaluator for /tests
(it can be worked on in parallel).
Proposal
- Determine evaluation criteria for this command. Examples:
- Test coverage (does the output cover all major workflows and edge cases?)
- Others?
- Design and implement the evaluator(s) in CEF.
- If more than one evaluator is required, you may want to separate it into another issue.
- Register the command for
/tests
in CEF.
It may be helpful to reference the code in gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library!985 (diffs).
Edited by Leaminn Ma