Split duochat documentation eval
Problem
eli5 implements an end to end approach to evaluating documentation search. This is captures the user experience as whole, but for a team trying to improve the score it becomes harder to decide where to focus on.
Proposed Solution
A request for documentation has a few different steps:
- Identify from the input that documentation is needed
- Format action input
- Retrieve the correct documents based on the action input
- Generate response based on results
All of these steps need to be logged, and can have independent evaluations