/refactor Evaluation
This issue is to capture work for the Custom Models team to contribute to validation dataset creation for /refactor.
While these /refactor is executed within Chat, the underlying functionality (and thereby dataset creation) is owned by Code Creation. As such, Custom models would be collaborating with Code Creation on these datasets.
#### Background
The [current prompt for /refactor ](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/tree/main/ai_gateway/prompts/definitions/chat/refactor_code?ref_type=heads)include the following context between the user and the system prompt:
* [`file_content`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads)
* [`selected_text`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads)
* [`input`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads)
* [`file_content_reuse`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads)
* [`language_info`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/system.jinja?ref_type=heads)
#### Proposal
Custom Models will collaborate with Code Creation to help create a validation dataset for /refactor. There are several potential sources from which we can draw data for inclusion in a /refactor datasets. The strong preference is to use historical Gitlab user data:
1. draw from historical data and [Chat bash data](https://gitlab.com/gitlab-org/ux-research/-/issues/2513 "Repeated analysis of user experience with Duo chat as we continuously improve the chat (also considering the outcome of each round of these user tests.)") - [spreadsheet](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1056599973#gid=1056599973). Chat bash datasets currently include 17 examples of refactor requests to GitLab Duo Chat, found in the [Refactor tab](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1681866418#gid=1681866418).
2. fetch commits from gitlab-org/gitlab that are labeled `refactor`
3. adapt a open source public dataset like [CodeEditorBench](https://huggingface.co/datasets/m-a-p/CodeEditorBench)
4. Generate examples by 'unrefactoring' code
##### Iteration I
The first iteration will adapt the OS datasets [CodeEditorBench](https://huggingface.co/datasets/m-a-p/CodeEditorBench), which includes the below schema. To this dataset we will add in the 17 examples of refactor requests to GL Duo Chat from the [Refactor tab](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1681866418#gid=1681866418) in the [Chat bash data](https://gitlab.com/gitlab-org/ux-research/-/issues/2513 "Repeated analysis of user experience with Duo chat as we continuously improve the chat (also considering the outcome of each round of these user tests.)") - [spreadsheet](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1056599973#gid=1056599973).
<table>
<tr>
<th>
idx
int64
</th>
<th>
title
string
</th>
<th>
code_language
string
</th>
<th>
incorrect_solutions
string
</th>
<th>
solutions
string
</th>
<th>
type
string
</th>
<th>
difficulty
string
</th>
<th>
public_tests_input
string
</th>
<th>
public_tests_output
string
</th>
<th>
private_tests_input
sequence
</th>
<th>
private_tests_output
sequence
</th>
</tr>
</table>
### Definition of Done
A first iteration of a validation dataset for /refactor has been completed with at least 70 to 120 prompts in accordance with [Playbook recommendations](https://docs.gitlab.com/ee/development/ai_features/ai_feature_development_playbook.html#key-considerations-when-making-a-dataset).
epic