Skip to content

Added ability to compare with ground truth

Hongtao Yang requested to merge hyang/compare-with-ground-truth into main

What does this merge request do and why?

This MR adds ability to compare with ground truth for mbpp dataset.

  • Added a HUMAN dummy model. We first read in all ground truth answers, then we mimic model API calling using the dummy HUMAN model. All it does is to pull out the ground truth answers and return those as the completions for the HUMAN model.
  • If we want to compare with ground truth, just specify "human" as one of the answering models in the config.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

  • I've ran the affected pipeline(s) to validate that nothing is broken.
  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Merge request reports