Skip to content

Create /tests evaluator and register evaluation command in CEF

Context

/tests is a Duo Chat IDE slash command that generates unit tests for the user's selected code.

In &16634, our objective is to establish an evaluation process to help us assess and monitor the accuracy of creating tests with /tests, particularly as we evaluate new models or new versions of models.

For this issue, we can utilize the dataset created in #515914 (closed) to help us craft the evaluator for /tests (it can be worked on in parallel).

Proposal

  1. Determine evaluation criteria for this command. Examples:
  • Test coverage (does the output cover all major workflows and edge cases?)
  • Others?
  1. Design and implement the evaluator(s) in CEF.
  • If more than one evaluator is required, you may want to separate it into another issue.
  1. Register the command for /tests in CEF.

It may be helpful to reference the code in gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library!985 (diffs).

Edited by Leaminn Ma