Skip to content

Claude 3.7 Sonnet Duo Workflow rollout plan

Overview

Anthropic has released a new version of Claude Sonnet (3.7). We will migrate Duo Workflow to use this version as all indications go towards it being better for Duo Workflow-like tasks (see announcement)

Related to gitlab-org/gitlab#521034 (closed)

Resource Links
Model

https://www.anthropic.com/news/claude-3-7-sonnet

Epic or Issue

gitlab-org/gitlab#521034 (closed)

Status updates

Rollout success criteria

  • SWE Results are greater than 41%

Dashboard References

Legal notes

Add legal notes here

Known issue list

List of issues identified throughout the evaluation, implementation, and rollout of the model.

Rollout

Timeline

  1. Create MR for Duo workflow service with necessary code changes to switch over to 3.7
  2. Run updated version off Duo Workflow service against a subset of 50+ SWE bench examples, from the same subset we used before
  3. Merge MR and do switch over on production if performance against this subset is equal to or better than 3.5.

Feedback from GitLab team members

Duo Workflow Dogfooding feedback

Pivot / Pause / Rollback Criteria

  1. SWE bench evaluation performance worse than 3.5
  2. Availability issues that lead to >= 10% of reliability impact.

Mitigation and Rollback Plan

  • Switch back to Sonnet should be easy revert of switch-over MR
Edited by Sebastian Rehm