Add OpenAI GPT OSS Model on Evaluation Runner & Run Evals
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Release Note
description: You can now use more supported models with GitLab Duo Self-Hosted, to include open source (OS) OpenAI GPT models and Anthropic Claude 4. OpenAI GPT OSS 20B and 120B are supported for use with GitLab Duo Self-Hosted on vLLM, Azure OpenAI, and AWS Bedrock. Claude 4 is supported on AWS Bedrock. Provide feedback on these models in issues #560016 and #550190 (closed).
documentation: 'https://docs.gitlab.com/administration/gitlab_duo_self_hosted/supported_models_and_hardware_requirements/'
Details
This issue is to add the newly released OpenAI GPS OS models to Evaluation Runner.
Models
License
- Apache 2.0
Definition of Done
-
The above GPT OS models have been added to evaluation runner -
Gitlab Developers can run evaluations on their features against the GPT OS models -
Each model has been run against available Evaluation datasets in ER -
The following identified bug has been addressed/remediated - #563341 -
The traffic light system for self-hosted models has been updated to include scores, and the documentation has been updated to reflect any changes
Edited by Susie Bitters