Roll out Sonnet 3.7 behind feature flag
With Ensure Duo Workflow does not encounter more err... (#314 - closed) we'll have made sure that Claude 3.7 is as likely to provide a complete workflow run as 3.5, but in our last rollout, we've also discovered that Duo Workflow behaves markedly differently in 3.7 compared to 3.5.
This ranges from disambiguation triggering more often, to Duo workflow becoming much more "wordy" in plans.
Desired Outcome
We've rolled out Sonnet 3.7 to a subset of internal users and collected feedback about it.
Implementation Plan
- Roll out 3.7 to Duo Workflow team members only + known power users internally (blocked by Replace env-vars with proper Feature Flag system (#295 - closed))
- Collect feedback about how Duo Workflow behavior differs
- Create issues for individual problems in behavior that come up.
Edited by Sebastian Rehm