Evaluation of Claude 3.7

The manual review on a sample subset has shown a good correlation between the LLM Judge and the human expert
The LLM Judge shows similar accuracy between Claude 3.5 and Claude 3.7
This review has been an opportunity to uncover a few pre-existing bugs that have been reported in Vulnerability Resolution - MR diff patch genera... (&17227)

We are ready to switch to Claude 3.7 ✅

Edited Mar 27, 2025 by Meir Benayoun