Claude 4.0 Sonnet Duo Chat Rollout Plan

Overview

Anthropic has released a new version of Claude Sonnet (4.0). We want to add the capabilities for Claude 4.0 for Duo Chat.

Related to #545117 (closed)

Resource	Links
Model	https://www.anthropic.com/news/claude-4
Epic or Issue	#545117 (closed)
Feature Flag Rollout Issue	#546256 (closed)
Status updates	#545491 (comment 2542506119)

Rollout success criteria

Equal or greater acceptance rate of for code generation requests with equivalent or lesser latency.

Dashboard References

https://10az.online.tableau.com/#/site/gitlab/views/PDCodeSuggestions/QualityMetrics?:iid=1

Legal notes

Add legal notes here

Known issue list

List of issues identified throughout the evaluation, implementation, and rollout of the model.

Rollout

Feedback from GitLab team members

Add link to the internal feedback issue.

Persevere / Continue Criteria

Latency remains equivalent or less than before
Acceptance rate remains greater than or equal to before
Nothing was raised as a blocker

Pivot / Pause / Rollback Criteria

Acceptance rate drops
Latency increases
Other blockers identified in testing

Evaluation result

Langsmith Experiment: View Results

Evaluation Pipeline	Model	Quality Metric	P50 Latency	P99 Latency
gitlab-docs context-qa	claude-4-0-sonnet (baseline)	accuracy: 2.95	22.5s	40.0s
gitlab-docs context-qa	claude-3-7-sonnet	accuracy: 3.05	20.0s	38.5s
Change		-3.4% improvement	-11% faster	-4% faster

Observations:

claude-3-7-sonnet showed improvements across all measured metrics compared to claude-4-0-sonnet.
claude-4-0-sonnet appears to be slightly worse on the context-qa. A lot of the outputs are similar with slight deviations on the output.
This wasn't tested with duo_chat_react_agent_claude_4_0 enabled. Have the ReAct agent enabled with claude-4-0-sonnet may improve the results slightly.

Observations:

Rollout success criteria

No degradation occurs to users on gitlab.com.

Dashboard References

Legal notes

Add legal notes here

Known issue list

List of issues identified throughout the evaluation, implementation, and rollout of the model.

Rollout

Timeline

Segment	Date	Audience	Status	Note
Duo React Agent (Claude 4.0 Sonnet)	June 4rd	All GitLab team members	✅	Internal testing of Claude 4.0 #546256 (comment 2542529912)
Duo React Agent (Claude 4.0 Sonnet)	June 5th	Globally rolled out	✅	Globally enabled feature flag.
Duo Chat GitLab-Doc Question-Answering	June 8th	No incremental rollout for the feature	✅	Plan to apply Claude 4.0 Sonnet to `gitlab-docs`. The following isn't behind a feature flag: Eval testing link: #545491 (comment 2535550490)
Duo Chat identifier Tools	June 10th	All GitLab team members	✅	Merged within larger MR gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2697 (merged)
Duo Chat identifier Tools	June 10th	Incremental rollout for all users and customers	✅	Merged within larger MR gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2697 (merged)

Feedback from GitLab team members

Add link to the internal feedback issue.

Pivot / Pause / Rollback Criteria

Poor performance in evaluations
Tool degredation

Mitigation and Rollback Plan

Example plan description:

We will use a Feature Flag to control the rollout. If there are any concerns (see above), we will disable the feature flag, especially for external users, to investigate any potential issues.

Release Announcement

Add details here about where to make announcements when the model is ready for rollout to external users.

Edited Jun 11, 2025 by Nathan Weinshenker