/refactor Evaluation
This issue is to capture work for the Custom Models team to contribute to validation dataset creation for /refactor. While these /refactor is executed within Chat, the underlying functionality (and thereby dataset creation) is owned by Code Creation. As such, Custom models would be collaborating with Code Creation on these datasets. #### Background The [current prompt for /refactor ](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/tree/main/ai_gateway/prompts/definitions/chat/refactor_code?ref_type=heads)include the following context between the user and the system prompt: * [`file_content`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads) * [`selected_text`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads) * [`input`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads) * [`file_content_reuse`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/user.jinja?ref_type=heads) * [`language_info`](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/prompts/definitions/chat/refactor_code/system.jinja?ref_type=heads) #### Proposal Custom Models will collaborate with Code Creation to help create a validation dataset for /refactor. There are several potential sources from which we can draw data for inclusion in a /refactor datasets. The strong preference is to use historical Gitlab user data: 1. draw from historical data and [Chat bash data](https://gitlab.com/gitlab-org/ux-research/-/issues/2513 "Repeated analysis of user experience with Duo chat as we continuously improve the chat (also considering the outcome of each round of these user tests.)") - [spreadsheet](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1056599973#gid=1056599973). Chat bash datasets currently include 17 examples of refactor requests to GitLab Duo Chat, found in the [Refactor tab](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1681866418#gid=1681866418). 2. fetch commits from gitlab-org/gitlab that are labeled `refactor` 3. adapt a open source public dataset like [CodeEditorBench](https://huggingface.co/datasets/m-a-p/CodeEditorBench) 4. Generate examples by 'unrefactoring' code ##### Iteration I The first iteration will adapt the OS datasets [CodeEditorBench](https://huggingface.co/datasets/m-a-p/CodeEditorBench), which includes the below schema. To this dataset we will add in the 17 examples of refactor requests to GL Duo Chat from the [Refactor tab](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1681866418#gid=1681866418) in the [Chat bash data](https://gitlab.com/gitlab-org/ux-research/-/issues/2513 "Repeated analysis of user experience with Duo chat as we continuously improve the chat (also considering the outcome of each round of these user tests.)") - [spreadsheet](https://docs.google.com/spreadsheets/d/1IJDzjUuAwpJW2qgW4qRW5DPr0AyHjdEr7auLVZ3xJ0w/edit?gid=1056599973#gid=1056599973). <table> <tr> <th> idx int64 </th> <th> title string </th> <th> code_language string </th> <th> incorrect_solutions string </th> <th> solutions string </th> <th> type string </th> <th> difficulty string </th> <th> public_tests_input string </th> <th> public_tests_output string </th> <th> private_tests_input sequence </th> <th> private_tests_output sequence </th> </tr> </table> ### Definition of Done A first iteration of a validation dataset for /refactor has been completed with at least 70 to 120 prompts in accordance with [Playbook recommendations](https://docs.gitlab.com/ee/development/ai_features/ai_feature_development_playbook.html#key-considerations-when-making-a-dataset).
epic