Self-Hosted Model Deployment - Code Suggestions
This epic is intended to capture the existing plan and iterative cadence for Custom Model support to self-hosted (open source model-based) Code Suggestions. Our iteration will follow a data-driven flow, predicated upon:
* identifying open source, licensable models fit for the Code Generation and Code Completion use cases
* validating and baselining performance of our selected open source model against the Code Generation and Code Completion use cases
* prompt iteration and testing
Work supporting Code Suggestions GA has been pulled out into [this epic](https://gitlab.com/groups/gitlab-org/-/epics/15176), and will be moved back to this base epic following GA.
Additional information on the timeline for Code Suggestions support can be found on the Custom Models [roadmap](https://gitlab.com/groups/gitlab-org/-/epics/14665).
### References
* [Code Suggestions GA Epic](https://gitlab.com/groups/gitlab-org/-/epics/15176)
* [Code Suggestions Prompts in the AIG](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/tree/670ea7fc9c2f8c7912b550cde166a9d59386c9d5/ai_gateway/prompts/definitions/code_suggestions)
# Validation/Quality Results
* [Code Suggestions Validation Results](https://docs.google.com/spreadsheets/d/1qdVO1yQhFPIzRzRrHzDVcu3aNGDgXTVaYRlYJH-0K0A/edit?gid=553333807#gid=553333807)
<table>
<tr>
<th rowspan="2">Model</th>
<th rowspan="2">GA?</th>
<th colspan="2">Code Suggestions</th>
</tr>
<tr>
<th>
Code Generation
cosine similarity
</th>
<th>Code Completion</th>
</tr>
<tr>
<th>correctness score</th>
<th></th>
<th>model similarity score</th>
<th>model similarity score</th>
</tr>
<tr>
<td>
**Claude 3.5 Sonnet on AWS Bedrock**
</td>
<td>
:white_check_mark:
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/485605#note_2150198467)
</td>
<td>
[.92](https://gitlab.com/gitlab-org/gitlab/-/issues/495789#note_2163384828)
</td>
</tr>
<tr>
<td>Claude 3 Haiku on AWS Bedrock</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/504436)
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/504437)
</td>
</tr>
<tr>
<td>
**Mistral 7B-it**
</td>
<td>
:white_check_mark:
</td>
<td>
[.86](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153423#note_1914296459)
</td>
<td>
[.77](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984)
</td>
</tr>
<tr>
<td>Mistral 7B v3</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/499889)
</td>
<td>
.[79](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984)
</td>
</tr>
<tr>
<td>
**Mistral 8x7B-it v1**
</td>
<td>
:white_check_mark:
</td>
<td>
[.88](https://gitlab.com/gitlab-org/gitlab/-/issues/455303#note_1928017257)
</td>
<td>
[.74](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984)
</td>
</tr>
<tr>
<td>Mistral 8x7B v1</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/499889)
</td>
<td>
[.81](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984)
</td>
</tr>
<tr>
<td>
**Mistral 8x22B-it**
</td>
<td>
:white_check_mark:
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/455303#note_1933176174)
</td>
<td>
[.88](https://gitlab.com/gitlab-org/gitlab/-/issues/493932#note_2185430542)
</td>
</tr>
<tr>
<td>Mistral 8x22B</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/499889)
</td>
<td>
[.75](https://gitlab.com/gitlab-org/gitlab/-/issues/493932#note_2185430542) / [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/502692)
</td>
</tr>
<tr>
<td>
**Codestral 22B-it**
</td>
<td>
:white_check_mark:
</td>
<td>
[.83](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/merge_requests/1772)
</td>
<td>
[.83](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/merge_requests/1772)
</td>
</tr>
<tr>
<td>
**gpt-4-turbo**
</td>
<td>
:white_check_mark:
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307)
</td>
<td>
[.91](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822)
</td>
</tr>
<tr>
<td>
**gpt-4o-mini**
</td>
<td>
:white_check_mark:
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307)
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822)
</td>
</tr>
<tr>
<td>
**gpt-4**
</td>
<td>
:x:
</td>
<td>
[not supported on AzureOpenAI](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2170640563)
</td>
<td>
[not supported on AzureOpenAI](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2170640563)
</td>
</tr>
<tr>
<td>
**gpt-4o**
</td>
<td>
:white_check_mark:
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307)
</td>
<td>
[.90](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822)
</td>
</tr>
<tr>
<td>
**gpt-3.5-turbo**
</td>
<td>
:x:
</td>
<td>
[.87](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307)
</td>
<td>
[.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822)
</td>
</tr>
<tr>
<td>Meta Llama-3-70B-Instruct</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670)
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101)
</td>
</tr>
<tr>
<td>Meta Llama-3-8B-Instruct</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670)
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101)
</td>
</tr>
<tr>
<td>Meta Llama-3.1-8B-Instruct</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670)
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101)
</td>
</tr>
<tr>
<td>Meta Llama-3.1-70B-Instruct</td>
<td>
:x:
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670)
</td>
<td>
[pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101)
</td>
</tr>
<tr>
<td>Code Gemma 7B-it</td>
<td>
:x:
</td>
<td>
[.88](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/155899#evaluation-results)
</td>
<td>x</td>
</tr>
<tr>
<td>Code Gemma 7B</td>
<td>
:x:
</td>
<td></td>
<td>
[0.70](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153966#evaluation-results)
</td>
</tr>
<tr>
<td>Code Gemma 2b</td>
<td>
:x:
</td>
<td></td>
<td>
[.72](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153966#evaluation-results)
</td>
</tr>
<tr>
<td>Code Llama 13B</td>
<td>
:x:
</td>
<td>
[.88](https://gitlab.com/gitlab-org/gitlab/-/issues/467439#note_1990748005)
</td>
<td>
[.73](https://gitlab.com/gitlab-org/gitlab/-/issues/467438#note_1993498296)
</td>
</tr>
<tr>
<td>Code Llama 7B</td>
<td>
:x:
</td>
<td></td>
<td>
[.74](https://gitlab.com/gitlab-org/gitlab/-/issues/467438#note_1993498296)
</td>
</tr>
<tr>
<td>DeepSeekCoder 1.3B base</td>
<td>
:x:
</td>
<td>
[.7621](https://gitlab.com/gitlab-org/gitlab/-/issues/471074#note_2035717648)
</td>
<td>
[.8083](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
</tr>
<tr>
<td>DeepSeekCoder 6.7B base</td>
<td>
:x:
</td>
<td>
[.7822](https://gitlab.com/gitlab-org/gitlab/-/issues/471074#note_2035717648)
</td>
<td>
[.8219](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
</tr>
<tr>
<td>DeepSeekCoder 7B base</td>
<td>
:x:
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DeepSeekCoder 33B base</td>
<td>
:x:
</td>
<td>
[.802](https://gitlab.com/gitlab-org/gitlab/-/issues/471074#note_2035717648)
</td>
<td>
[.8146](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
</tr>
<tr>
<td>DeepSeekCoder 1.3B-it</td>
<td>
:x:
</td>
<td>
[.8034](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
<td></td>
</tr>
<tr>
<td>DeepSeekCoder 6.7B-it</td>
<td>
:x:
</td>
<td>
[.82](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
<td>
[.7546](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
</tr>
<tr>
<td>DeepSeekCoder 7B-it</td>
<td>
:x:
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>DeepSeekCoder 33B-it</td>
<td>
:x:
</td>
<td>
[.81](https://gitlab.com/gitlab-org/gitlab/-/issues/471074)
</td>
<td></td>
</tr>
</table>
## References
* [Code Suggestion Prompts](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/tree/main/ai_gateway/prompts/definitions/code_suggestions)
# Background
[Code Suggestions](https://internal.gitlab.com/handbook/product/ai-strategy/code-suggestions/) can currently be understood by its two main use-cases, code completion (fill in the middle) and code generation (generated from a comment block of function signature). Each use case has its own model. Determination about [which model ](https://gitlab.com/gitlab-org/gitlab/-/blob/5836b418aebefe4cc93f072d61c615ea8a104453/ee/lib/code_suggestions/task_selector.rb#L30)to trigger [occurs in the IDE extension via TreeSitter](https://gitlab.com/groups/gitlab-org/-/epics/11568). [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) is an incremental parsing library, that can build concrete syntax tree for a source code. It utilizes a plugin paradigm, allowing many different programming languages to be parsed and analyzed using a single query interface. Telemetry is currently collected in the IDE (and can only be collected there). For each use case, we will need to consider pre- and -post processing steps currently integrated into the Code Suggestion flow as well as prompting.
#### Code Completion
* use code-gecko (designed for code completion) and has smaller token limitations for input
* also uses Anthropic with associated [template based prompting](https://gitlab.com/gitlab-org/gitlab/-/blob/84246fc668fcd3c70773b2b39cedf28ac9e1e261/ee/lib/code_suggestions/prompts/code_completion/anthropic.rb)
* context is the entire file (what is before and after), but may not take whole file due to window limitations
* trigger is the update on the editor input
* cached at the line level so as to avoid repeat queries to the LLM
* currently no prompting from our end
* pre-processing: context trimming (trims form start of line in a window before/after)
* post-processing
#### Code Generation
* uses Anthropic Claude
* two trigger scenarios - comment block with user instruction or function signature (dependent on language being used, based on regex specific to the language and not all languages supported)
* has some [template based prompting](https://gitlab.com/gitlab-org/gitlab/-/blob/d1a5dd0baf4fc75440ccd3c49bae0240171e14f1/ee/lib/code_suggestions/prompts/code_generation/vertex_ai.rb) with parameters (language, file path, prefix, extension, existing code instructions and existing code block)
#### [Processing Highlights](https://internal.gitlab.com/handbook/product/ai-strategy/code-suggestions/#what-do-we-do-in-post-processing)
Pre-Processing
* exceeding tokens (500k characters)
* tree sitter parsing into Abstract Syntax Tree (AST)
* prepend comment with file name and detected language
Post-Processing
* confidence score (must be above x threshhold)
* removed completions that are only comments
* trim completions
* clean
* remove whitespace completions
epic