Skip to content

Evaluation of Claude 3.7

Execution Plan

  • Run CEF with Claude 3.7
  • Check that the logs show the correct prompt_version and parameters
  • Manual review on a small random sample

Resources

Conclusion

  • The manual review on a sample subset has shown a good correlation between the LLM Judge and the human expert
  • The LLM Judge shows similar accuracy between Claude 3.5 and Claude 3.7
  • This review has been an opportunity to uncover a few pre-existing bugs that have been reported in Vulnerability Resolution - MR diff patch genera... (&17227)

We are ready to switch to Claude 3.7

Edited by Meir Benayoun