Skip to content

Claude 4.0 Sonnet Duo Chat Rollout Plan

Overview

Anthropic has released a new version of Claude Sonnet (4.0). We want to add the capabilities for Claude 4.0 for Duo Chat.

Related to #545117 (closed)

Resource Links
Model https://www.anthropic.com/news/claude-4
Epic or Issue #545117 (closed)
Feature Flag Rollout Issue #546256 (closed)
Status updates #545491 (comment 2542506119)

Rollout success criteria

Equal or greater acceptance rate of for code generation requests with equivalent or lesser latency.

Dashboard References

https://10az.online.tableau.com/#/site/gitlab/views/PDCodeSuggestions/QualityMetrics?:iid=1

Legal notes

Add legal notes here

Known issue list

List of issues identified throughout the evaluation, implementation, and rollout of the model.

Rollout

Feedback from GitLab team members

Add link to the internal feedback issue.

Persevere / Continue Criteria

  1. Latency remains equivalent or less than before
  2. Acceptance rate remains greater than or equal to before
  3. Nothing was raised as a blocker

Pivot / Pause / Rollback Criteria

  1. Acceptance rate drops
  2. Latency increases
  3. Other blockers identified in testing

Evaluation result

Langsmith Experiment: View Results

Evaluation Pipeline Model Quality Metric P50 Latency P99 Latency
gitlab-docs context-qa claude-4-0-sonnet (baseline) accuracy: 2.95 22.5s 40.0s
gitlab-docs context-qa claude-3-7-sonnet accuracy: 3.05 20.0s 38.5s
Change -3.4% improvement -11% faster -4% faster

Observations:

  • claude-3-7-sonnet showed improvements across all measured metrics compared to claude-4-0-sonnet.
  • claude-4-0-sonnet appears to be slightly worse on the context-qa. A lot of the outputs are similar with slight deviations on the output.
  • This wasn't tested with duo_chat_react_agent_claude_4_0 enabled. Have the ReAct agent enabled with claude-4-0-sonnet may improve the results slightly.

Observations:

Rollout success criteria

No degradation occurs to users on gitlab.com.

Dashboard References

Legal notes

Add legal notes here

Known issue list

List of issues identified throughout the evaluation, implementation, and rollout of the model.

Rollout

Timeline

Segment Date Audience Status Note
Duo React Agent (Claude 4.0 Sonnet) June 4rd All GitLab team members

Internal testing of Claude 4.0 #546256 (comment 2542529912)

Duo React Agent (Claude 4.0 Sonnet) June 5th Globally rolled out

Globally enabled feature flag.

Duo Chat GitLab-Doc Question-Answering June 8th No incremental rollout for the feature

Plan to apply Claude 4.0 Sonnet to gitlab-docs. The following isn't behind a feature flag:

Eval testing link: #545491 (comment 2535550490)

Duo Chat identifier Tools

June

10th

All GitLab team members

Merged within larger MR gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2697 (merged)

Duo Chat identifier Tools

June

10th

Incremental rollout for all users and customers

Merged within larger MR gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!2697 (merged)

Feedback from GitLab team members

Add link to the internal feedback issue.

Pivot / Pause / Rollback Criteria

  1. Poor performance in evaluations
  2. Tool degredation

Mitigation and Rollback Plan

Example plan description:

We will use a Feature Flag to control the rollout. If there are any concerns (see above), we will disable the feature flag, especially for external users, to investigate any potential issues.

Release Announcement

Add details here about where to make announcements when the model is ready for rollout to external users.

Edited by Nathan Weinshenker