Seed Compatible Model / Platform Catalogue
<!--IssueSummary start-->
<details>
<summary>
Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards.
</summary>
- [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=554290)
</details>
<!--IssueSummary end-->
This issue is to capture work to seed a compatible model / platforms catalogue to help Self-Hosted admins get started in exploring compatible models and platforms.
Custom Models will validate the performance of at least 10 models and platforms and add them to a Compatible catalogue. Validation for compatibility will occur as possible within Evaluation Runner, and may be supplemented with manual testing or automated testing as devised in https://gitlab.com/gitlab-org/gitlab/-/issues/554895+s
Options for evaluation **in order of priority** include:
**Platforms**
* OpenAI (direct API access)
* [Google Vertex](https://cloud.google.com/vertex-ai)
* [Grok](https://x.ai/api)
* [IBM Watsonx.ai](https://www.ibm.com/products/watsonx-ai/foundation-models)
* [NVIDIA NIM](https://build.nvidia.com/models?ncid=pa-srch-goog-912323-API-Brand-Exact)
* Anthropic (direct API access)
* [ollama](https://ollama.com/)
**Models**
* [Google Gemini 2.5 Flash](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)
* [Google Gemini 2.5 Pro](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro)
* [Qwen 2.5 Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)
* [Mistral Devstral Small](https://huggingface.co/mistralai/Devstral-Small-2505)
* [Llama-3_3-Nemotron-Super-49B-v1](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1)
* Mixtral 8x7B (see https://gitlab.com/gitlab-org/gitlab/-/issues/517649)
* Mistral 7B-it (see https://gitlab.com/gitlab-org/gitlab/-/issues/517649)
* Mixtral 8x22B (see https://gitlab.com/gitlab-org/gitlab/-/issues/517649)
### **Definition of Done**
* [ ] Each model/platform can be used to support Self-Hosted Duo features
* [ ] Examine individual inputs and outputs using the <a href="https://gitlab.com/gitlab-org/gitlab/-/issues/517581">base/generic prompt</a> for each feature.
* [ ] For those Duo features for which we have validation datasets, quality results based on LLM Judge scores 1-4 and/or cosine similarity are recorded in this issue's comments as distributions. For LLM Judges this means buckets of 1s, 2s, 3s, 4s. For Cosine similarity scores, this means buckets of similarity scores 0.9 and above, 0.8-0.89, 0.7-0.79 and so on.
* [ ] For those Duo features for which we have no validation datasets, utilize the <a href="https://gitlab.com/gitlab-org/gitlab/-/issues/554895">public validation script</a>
* [ ] If the proposed model/platform passes baseline requirements, add to the <a href="https://gitlab.com/gitlab-org/gitlab/-/issues/554595">Compatible Model Catalog</a>
epic