Added ability to compare with ground truth
What does this merge request do and why?
This MR adds ability to compare with ground truth for mbpp dataset.
- Added a
HUMAN
dummy model. We first read in all ground truth answers, then we mimic model API calling using the dummyHUMAN
model. All it does is to pull out the ground truth answers and return those as the completions for theHUMAN
model. - If we want to compare with ground truth, just specify "human" as one of the answering models in the config.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.