Claude 3.7 Sonnet Duo Workflow rollout plan
Overview
Anthropic has released a new version of Claude Sonnet (3.7). We will migrate Duo Workflow to use this version as all indications go towards it being better for Duo Workflow-like tasks (see announcement)
Related to gitlab-org/gitlab#521034 (closed)
| Resource | Links |
|---|---|
| Model | |
| Epic or Issue | |
| Status updates |
Rollout success criteria
- SWE Results are greater than 41%
Dashboard References
Legal notes
Add legal notes here
Known issue list
List of issues identified throughout the evaluation, implementation, and rollout of the model.
Rollout
Timeline
- Create MR for Duo workflow service with necessary code changes to switch over to 3.7
- Run updated version off Duo Workflow service against a subset of 50+ SWE bench examples, from the same subset we used before
- Merge MR and do switch over on production if performance against this subset is equal to or better than 3.5.
Feedback from GitLab team members
Duo Workflow Dogfooding feedback
Pivot / Pause / Rollback Criteria
- SWE bench evaluation performance worse than 3.5
- Availability issues that lead to >= 10% of reliability impact.
Mitigation and Rollback Plan
- Switch back to Sonnet should be easy revert of switch-over MR
Edited by Sebastian Rehm