Add similarity evaluator for Duo Code Review

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

We'd like to add new eli5 evaluator to compare the output of Duo Code Review with the human reviews to evaluate the effectiveness of the Duo Code Review.

Eventually we might want to use a curated list of MRs that includes what we're looking for, but we can start off with existing gitlab MRs from gitlab.com.

For this work, we would need to :

  • Prepare/decide a new dataset schema that would work for this evaluator (Probably needs some help from AI Framework team)

    • If we can find it, we could use the script that was used to generate dataset.duo_code_review.1 and use that as an example
    • We'd need at least these columns to begin with
      • mr_title
      • mr_description
      • diffs (entire diffs for a MR)
      • diff_notes (formatted to match DCR output as closely as possible so that we can compare this against DCR reviews)
      • Do note that these dataset would change as we go since we're planning to provide more context to DCR overtime
  • Implement new evaluation in eli5 (note the source location of the repo is being migrated to this) using the new dataset

  • Outdated code in ee/lib/api/duo_code_review.rb would also need to be updated

Related to #490963 (comment 2429262919)

Edited by 🤖 GitLab Bot 🤖