Spike: Add Mistral Devstral to Evaluation Runner on classiic features (Code suggestion and Chat) and DAP (Agentic Chat)

In order to support the validation of Self-Hosted models on Evaluation Runner as part of Self-Hosted platformization, we need to enable feature teams to test supported and relevant models.

Mistral Devstral-Small-2507 is a likely candidate within our supported model families for quality performance with Agentic Flows, with a score of 53% on SWE-Bench Verified. (For context, Claude Opus 4 got a 72% score on SWE-bench while Claude Haiku got a 40% score).

Devstral is released under the Apache 2.0 license

Devstral is not currently available on AWS Bedrock or Fireworks.ai. We need a feasible alternative to host the model for development by feature teams.

Model
  • Devstral-Small-2507
Supported platforms
  • vLLM

Definition of Done

  • Devstral Small has been added to Evaluation Runner for use by GitLab developers for Agentic features
  • Devstral has been evaluated for DAP use cases, as enabled in Agentic Feature Evaluation Platform (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation&54) • Unassigned
Edited Nov 04, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading