Local Model + Inference Using NVIDIA NGC
<!--IssueSummary start--> <details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=566314) </details> <!--IssueSummary end--> This issue is to evaluate use of a local model via [NVIDIA NCG ](https://catalog.ngc.nvidia.com/models?filters=&orderBy=weightPopularDESC&query=&page=&pageSize=\))for use as a compatible platform with GitLab Duo Self-Hosted. ### **Definition of Done** * [ ] At least one model currently supported by GitLab Duo Self-Hosted can be used with a locally deployed model from the NVIDIA NGC catalogue to support the feature on the platform * [ ] Achieve less than 20% poor answers (defined as 1s and 2s from an LLM judge, or less than 0.8 cosine similarity) using each supported model for those areas in which we do have supporting validation datasets. * [ ] Quality results, based on LLM Judge scores 1-4 and/or cosine similarity are recorded in this issue's comments as distributions. For LLM Judges this means buckets of 1s, 2s, 3s, 4s. For Cosine similarity scores, this means buckets of similarity scores 0.9 and above, 0.8-0.89, 0.7-0.79 and so on. * [ ] The platform has been added to the Compatible Catalogue
issue