Claude 4.0 Sonnet Duo Chat Rollout Plan
Overview
Anthropic has released a new version of Claude Sonnet (4.0). We want to add the capabilities for Claude 4.0 for Duo Chat.
Related to #545117 (closed)
| Resource | Links |
|---|---|
| Model | https://www.anthropic.com/news/claude-4 |
| Epic or Issue | #545117 (closed) |
| Feature Flag Rollout Issue | #546256 (closed) |
| Status updates | #545491 (comment 2542506119) |
Rollout success criteria
Equal or greater acceptance rate of for code generation requests with equivalent or lesser latency.
Dashboard References
https://10az.online.tableau.com/#/site/gitlab/views/PDCodeSuggestions/QualityMetrics?:iid=1
Legal notes
Add legal notes here
Known issue list
List of issues identified throughout the evaluation, implementation, and rollout of the model.
Rollout
Feedback from GitLab team members
Add link to the internal feedback issue.
Persevere / Continue Criteria
- Latency remains equivalent or less than before
- Acceptance rate remains greater than or equal to before
- Nothing was raised as a blocker
Pivot / Pause / Rollback Criteria
- Acceptance rate drops
- Latency increases
- Other blockers identified in testing
Evaluation result
Langsmith Experiment: View Results
| Evaluation Pipeline | Model | Quality Metric | P50 Latency | P99 Latency |
|---|---|---|---|---|
| gitlab-docs context-qa | claude-4-0-sonnet (baseline) | accuracy: 2.95 | 22.5s | 40.0s |
| gitlab-docs context-qa | claude-3-7-sonnet | accuracy: 3.05 | 20.0s | 38.5s |
| Change | -3.4% improvement | -11% faster | -4% faster |
Observations:
-
claude-3-7-sonnetshowed improvements across all measured metrics compared toclaude-4-0-sonnet. -
claude-4-0-sonnetappears to be slightly worse on the context-qa. A lot of the outputs are similar with slight deviations on the output. - This wasn't tested with
duo_chat_react_agent_claude_4_0enabled. Have the ReAct agent enabled withclaude-4-0-sonnetmay improve the results slightly.
Observations:
Rollout success criteria
No degradation occurs to users on gitlab.com.
Dashboard References
Legal notes
Add legal notes here
Known issue list
List of issues identified throughout the evaluation, implementation, and rollout of the model.
Rollout
Timeline
| Segment | Date | Audience | Status | Note |
|---|---|---|---|---|
| Duo React Agent (Claude 4.0 Sonnet) | June 4rd | All GitLab team members |
|
Internal testing of Claude 4.0 #546256 (comment 2542529912) |
| Duo React Agent (Claude 4.0 Sonnet) | June 5th | Globally rolled out |
|
|
| Duo Chat GitLab-Doc Question-Answering | June 8th | No incremental rollout for the feature |
|
Plan to apply Claude 4.0 Sonnet to Eval testing link: #545491 (comment 2535550490) |
| Duo Chat identifier Tools |
June 10th |
All GitLab team members |
|
Merged within larger MR gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2697 (merged) |
| Duo Chat identifier Tools |
June 10th |
Incremental rollout for all users and customers |
|
Merged within larger MR gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2697 (merged) |
Feedback from GitLab team members
Add link to the internal feedback issue.
Pivot / Pause / Rollback Criteria
- Poor performance in evaluations
- Tool degredation
Mitigation and Rollback Plan
Example plan description:
We will use a Feature Flag to control the rollout. If there are any concerns (see above), we will disable the feature flag, especially for external users, to investigate any potential issues.
Release Announcement
Add details here about where to make announcements when the model is ready for rollout to external users.