Add OpenAI GPT OSS Model on Evaluation Runner & Run Evals

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Release Note

description: You can now use more supported models with GitLab Duo Self-Hosted, to include open source (OS) OpenAI GPT models and Anthropic Claude 4. OpenAI GPT OSS 20B and 120B are supported for use with GitLab Duo Self-Hosted on vLLM, Azure OpenAI, and AWS Bedrock. Claude 4 is supported on AWS Bedrock. Provide feedback on these models in issues #560016 and #550190 (closed).

documentation: 'https://docs.gitlab.com/administration/gitlab_duo_self_hosted/supported_models_and_hardware_requirements/'

Details

This issue is to add the newly released OpenAI GPS OS models to Evaluation Runner.

Models

License

Apache 2.0

Definition of Done

The above GPT OS models have been added to evaluation runner
Gitlab Developers can run evaluations on their features against the GPT OS models
Each model has been run against available Evaluation datasets in ER
The following identified bug has been addressed/remediated - #563341 (closed)
The traffic light system for self-hosted models has been updated to include scores, and the documentation has been updated to reflect any changes

Edited Sep 02, 2025 by Susie Bitters