Switch back to Anthropic Claude 3.5
Problem
We recently switched to Claude Sonnet 3.7 (see #257 (closed)), since we were seeing promising results when it comes to the output quality (e.g. measured by SWE bench).
However, since the switch we've been seeing two problems with it:
-
Malformed tool calls, which can break our execution, e.g. see #269 (moved) for an issue working on this.
-
An increased number of server errors from Anthropic (500) which lead to an overall failure of the workflow if they happen too often, 33% of fatal errors in the last 5 days where due to this.
We recently also had to revert a fix to the first problem in !328 (merged) showing that dealing with these responses from 3.7 is not trivial.
Desired Outcome
We're switched back to Sonnet 3.5 for now, and have increased reliability as an outcome.