Devstral Small for DAP

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Collaborate/take over this issue

This issue is to add Devstral Small (24B) support for Duo features -- to include DAP. Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the top open source model on SWE Bench.

It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

Model

Devstral-Small-2507

License

Apache 2.0

Platforms

vLLM

Definition of Done

Each model can be used to support Duo features on all supported platforms
Examine individual inputs and outputs that scored poorly (1-2 scores); Look for and document any patterns of either poor feature performance or poor LLM judge callibration. Iterate on the model prompt to eradicate patterns of poor performance.
Achieve less than 20% poor answers (defined as 1s and 2s from an LLM judge, or less than 0.8 cosine similarity) using each supported model for those areas in which we do have supporting validation datasets.
Quality results, based on LLM Judge scores 1-4 and/or cosine similarity are recorded in this issue's comments as distributions. For LLM Judges this means buckets of 1s, 2s, 3s, 4s. For Cosine similarity scores, this means buckets of similarity scores 0.9 and above, 0.8-0.89, 0.7-0.79 and so on.
The traffic light system for self-hosted models has been updated to include scores, and the documentation has been updated to reflect any changes

Edited Nov 04, 2025 by 🤖 GitLab Bot 🤖