Provide a summary report for each evaluation
Problem to solve
We are generating raw output with metrics for each tests. The user has to manually inspect the experiment results with the control ones to assess the performance.
Proposal
Provide a summary report to allow quickly assessing the impact of the experiment.
- Generate a summary report showing the statistics.
- Compare with the control dataset.
- Indicate if it is better / worse than the control numbers.
Further details
Links / references
Inspirtation : https://www.promptfoo.dev/docs/intro/
Edited by Mon Ray