Self-Hosted Model Deployment - Code Suggestions (#13730) · Epics · GitLab.org

Self-Hosted Model Deployment - Code Suggestions

This epic is intended to capture the existing plan and iterative cadence for Custom Model support to self-hosted (open source model-based) Code Suggestions. Our iteration will follow a data-driven flow, predicated upon: * identifying open source, licensable models fit for the Code Generation and Code Completion use cases * validating and baselining performance of our selected open source model against the Code Generation and Code Completion use cases * prompt iteration and testing Work supporting Code Suggestions GA has been pulled out into [this epic](https://gitlab.com/groups/gitlab-org/-/epics/15176), and will be moved back to this base epic following GA. Additional information on the timeline for Code Suggestions support can be found on the Custom Models [roadmap](https://gitlab.com/groups/gitlab-org/-/epics/14665). ### References * [Code Suggestions GA Epic](https://gitlab.com/groups/gitlab-org/-/epics/15176) * [Code Suggestions Prompts in the AIG](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/tree/670ea7fc9c2f8c7912b550cde166a9d59386c9d5/ai_gateway/prompts/definitions/code_suggestions) # Validation/Quality Results * [Code Suggestions Validation Results](https://docs.google.com/spreadsheets/d/1qdVO1yQhFPIzRzRrHzDVcu3aNGDgXTVaYRlYJH-0K0A/edit?gid=553333807#gid=553333807) <table> <tr> <th rowspan="2">Model</th> <th rowspan="2">GA?</th> <th colspan="2">Code Suggestions</th> </tr> <tr> <th> Code Generation cosine similarity </th> <th>Code Completion</th> </tr> <tr> <th>correctness score</th> <th></th> <th>model similarity score</th> <th>model similarity score</th> </tr> <tr> <td> **Claude 3.5 Sonnet on AWS Bedrock** </td> <td> :white_check_mark: </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/485605#note_2150198467) </td> <td> [.92](https://gitlab.com/gitlab-org/gitlab/-/issues/495789#note_2163384828) </td> </tr> <tr> <td>Claude 3 Haiku on AWS Bedrock</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/504436) </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/504437) </td> </tr> <tr> <td> **Mistral 7B-it** </td> <td> :white_check_mark: </td> <td> [.86](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153423#note_1914296459) </td> <td> [.77](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984) </td> </tr> <tr> <td>Mistral 7B v3</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/499889) </td> <td> .[79](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984) </td> </tr> <tr> <td> **Mistral 8x7B-it v1** </td> <td> :white_check_mark: </td> <td> [.88](https://gitlab.com/gitlab-org/gitlab/-/issues/455303#note_1928017257) </td> <td> [.74](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984) </td> </tr> <tr> <td>Mistral 8x7B v1</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/499889) </td> <td> [.81](https://gitlab.com/gitlab-org/gitlab/-/issues/475104#note_2085789984) </td> </tr> <tr> <td> **Mistral 8x22B-it** </td> <td> :white_check_mark: </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/455303#note_1933176174) </td> <td> [.88](https://gitlab.com/gitlab-org/gitlab/-/issues/493932#note_2185430542) </td> </tr> <tr> <td>Mistral 8x22B</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/499889) </td> <td> [.75](https://gitlab.com/gitlab-org/gitlab/-/issues/493932#note_2185430542) / [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/502692) </td> </tr> <tr> <td> **Codestral 22B-it** </td> <td> :white_check_mark: </td> <td> [.83](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/merge_requests/1772) </td> <td> [.83](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/merge_requests/1772) </td> </tr> <tr> <td> **gpt-4-turbo** </td> <td> :white_check_mark: </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307) </td> <td> [.91](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822) </td> </tr> <tr> <td> **gpt-4o-mini** </td> <td> :white_check_mark: </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307) </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822) </td> </tr> <tr> <td> **gpt-4** </td> <td> :x: </td> <td> [not supported on AzureOpenAI](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2170640563) </td> <td> [not supported on AzureOpenAI](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2170640563) </td> </tr> <tr> <td> **gpt-4o** </td> <td> :white_check_mark: </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307) </td> <td> [.90](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822) </td> </tr> <tr> <td> **gpt-3.5-turbo** </td> <td> :x: </td> <td> [.87](https://gitlab.com/gitlab-org/gitlab/-/issues/473745#note_2150412307) </td> <td> [.89](https://gitlab.com/gitlab-org/gitlab/-/issues/473746#note_2163718822) </td> </tr> <tr> <td>Meta Llama-3-70B-Instruct</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670) </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101) </td> </tr> <tr> <td>Meta Llama-3-8B-Instruct</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670) </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101) </td> </tr> <tr> <td>Meta Llama-3.1-8B-Instruct</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670) </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101) </td> </tr> <tr> <td>Meta Llama-3.1-70B-Instruct</td> <td> :x: </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/512670) </td> <td> [pending](https://gitlab.com/gitlab-org/gitlab/-/issues/475101) </td> </tr> <tr> <td>Code Gemma 7B-it</td> <td> :x: </td> <td> [.88](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/155899#evaluation-results) </td> <td>x</td> </tr> <tr> <td>Code Gemma 7B</td> <td> :x: </td> <td></td> <td> [0.70](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153966#evaluation-results) </td> </tr> <tr> <td>Code Gemma 2b</td> <td> :x: </td> <td></td> <td> [.72](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/153966#evaluation-results) </td> </tr> <tr> <td>Code Llama 13B</td> <td> :x: </td> <td> [.88](https://gitlab.com/gitlab-org/gitlab/-/issues/467439#note_1990748005) </td> <td> [.73](https://gitlab.com/gitlab-org/gitlab/-/issues/467438#note_1993498296) </td> </tr> <tr> <td>Code Llama 7B</td> <td> :x: </td> <td></td> <td> [.74](https://gitlab.com/gitlab-org/gitlab/-/issues/467438#note_1993498296) </td> </tr> <tr> <td>DeepSeekCoder 1.3B base</td> <td> :x: </td> <td> [.7621](https://gitlab.com/gitlab-org/gitlab/-/issues/471074#note_2035717648) </td> <td> [.8083](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> </tr> <tr> <td>DeepSeekCoder 6.7B base</td> <td> :x: </td> <td> [.7822](https://gitlab.com/gitlab-org/gitlab/-/issues/471074#note_2035717648) </td> <td> [.8219](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> </tr> <tr> <td>DeepSeekCoder 7B base</td> <td> :x: </td> <td></td> <td></td> </tr> <tr> <td>DeepSeekCoder 33B base</td> <td> :x: </td> <td> [.802](https://gitlab.com/gitlab-org/gitlab/-/issues/471074#note_2035717648) </td> <td> [.8146](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> </tr> <tr> <td>DeepSeekCoder 1.3B-it</td> <td> :x: </td> <td> [.8034](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> <td></td> </tr> <tr> <td>DeepSeekCoder 6.7B-it</td> <td> :x: </td> <td> [.82](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> <td> [.7546](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> </tr> <tr> <td>DeepSeekCoder 7B-it</td> <td> :x: </td> <td></td> <td></td> </tr> <tr> <td>DeepSeekCoder 33B-it</td> <td> :x: </td> <td> [.81](https://gitlab.com/gitlab-org/gitlab/-/issues/471074) </td> <td></td> </tr> </table> ## References * [Code Suggestion Prompts](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/tree/main/ai_gateway/prompts/definitions/code_suggestions) # Background [Code Suggestions](https://internal.gitlab.com/handbook/product/ai-strategy/code-suggestions/) can currently be understood by its two main use-cases, code completion (fill in the middle) and code generation (generated from a comment block of function signature). Each use case has its own model. Determination about [which model ](https://gitlab.com/gitlab-org/gitlab/-/blob/5836b418aebefe4cc93f072d61c615ea8a104453/ee/lib/code_suggestions/task_selector.rb#L30)to trigger [occurs in the IDE extension via TreeSitter](https://gitlab.com/groups/gitlab-org/-/epics/11568). [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) is an incremental parsing library, that can build concrete syntax tree for a source code. It utilizes a plugin paradigm, allowing many different programming languages to be parsed and analyzed using a single query interface. Telemetry is currently collected in the IDE (and can only be collected there). For each use case, we will need to consider pre- and -post processing steps currently integrated into the Code Suggestion flow as well as prompting. #### Code Completion * use code-gecko (designed for code completion) and has smaller token limitations for input * also uses Anthropic with associated [template based prompting](https://gitlab.com/gitlab-org/gitlab/-/blob/84246fc668fcd3c70773b2b39cedf28ac9e1e261/ee/lib/code_suggestions/prompts/code_completion/anthropic.rb) * context is the entire file (what is before and after), but may not take whole file due to window limitations * trigger is the update on the editor input * cached at the line level so as to avoid repeat queries to the LLM * currently no prompting from our end * pre-processing: context trimming (trims form start of line in a window before/after) * post-processing #### Code Generation * uses Anthropic Claude * two trigger scenarios - comment block with user instruction or function signature (dependent on language being used, based on regex specific to the language and not all languages supported) * has some [template based prompting](https://gitlab.com/gitlab-org/gitlab/-/blob/d1a5dd0baf4fc75440ccd3c49bae0240171e14f1/ee/lib/code_suggestions/prompts/code_generation/vertex_ai.rb) with parameters (language, file path, prefix, extension, existing code instructions and existing code block) #### [Processing Highlights](https://internal.gitlab.com/handbook/product/ai-strategy/code-suggestions/#what-do-we-do-in-post-processing) Pre-Processing * exceeding tokens (500k characters) * tree sitter parsing into Abstract Syntax Tree (AST) * prepend comment with file name and detected language Post-Processing * confidence score (must be above x threshhold) * removed completions that are only comments * trim completions * clean * remove whitespace completions

epic