Skip to content

Merge all evaluation script efforts, prompts, and testing results into main branch

Dylan Bernardi requested to merge v1-evaluation into main

Scope of MR

The entire scope of the project can be found here as well as the epic here. However, this focuses on creating, implementing, and testing a model evaluation testing script.

This MR includes five main components:

  1. evaluation.py - This is the script that accomplishes the model testing. The input is a txt file where each line is a separate prompt. The output is code.
  2. prompts - This folder contains all the prompts used in testing. The prompts for generation are split by language for organization, but are mainly the same plain english except for a few word changes (for example, C does not have a standard dictionary). The completion prompts differ by language as they are written in the language that is being tested.
  3. generation results - This folder includes all the results for code generations.
  4. completion results - This folder includes all the results for code completions.
  5. documentation - The documentation includes a brief overview of the efforts and how to run make targets.

Associated Issues

Test runner: https://gitlab.com/gitlab-org/gitlab/-/issues/415774+

Javascript - completion: https://gitlab.com/gitlab-org/gitlab/-/issues/415783+
Javascript - generation: https://gitlab.com/gitlab-org/gitlab/-/issues/415782+

Python - completion: https://gitlab.com/gitlab-org/gitlab/-/issues/415780+
Python - generation: https://gitlab.com/gitlab-org/gitlab/-/issues/415772+

C - completion: https://gitlab.com/gitlab-org/gitlab/-/issues/415788+
C - generation: https://gitlab.com/gitlab-org/gitlab/-/issues/415787+

Golang - completion: https://gitlab.com/gitlab-org/gitlab/-/issues/415786+
Golang - generation: https://gitlab.com/gitlab-org/gitlab/-/issues/415785+

Follow-up(s)

As with any work at GitLab, there are already some apparent follow-up items that are mapped out for the next iteration of the model evaluation scripts as well as the entire testing method. The issues below are just some of the follow-up items for consideration:

https://gitlab.com/gitlab-org/gitlab/-/issues/416330+
https://gitlab.com/gitlab-org/gitlab/-/issues/416329+

cc @sean_carroll @mray2020 @srayner @allison.browne @andrei.zubov

Closes https://gitlab.com/gitlab-org/gitlab/-/issues/415774+

Edited by Sean Carroll

Merge request reports