Log error on duplicate results instead of raising RuntimeError
What does this merge request do and why?
This MR replaces the RuntimeError raised when encountering an unexpected number of responses (generally caused by duplicate rows in the source dataset). The goal of the MR is not to throw out all the results just because a number of duplicate rows exist.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Run the prompt library with the
similarity_score
metric enabled on a dataset that has duplicate values (e.g.dev-ai-research-0e2f8974.duo_chat_experiments.mbpp_validation_7_records_duplicates
- Ensure that error logs appear warning about
"Wrong number of answers for a question!"
- Ensure that the resulting dataset has the expected number of results (i.e. the normal amount except the duplicates)
Alternatively, see the results of a previous run under dev-ai-research-0e2f8974.duo_chat_experiments.aherczeg_duplicate_test__*
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Edited by Andras Herczeg