Evaluation of Claude 4.0
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Execution Plan
-
Run CEF with Claude 4.0 -
Check that the logs show the correct prompt_versionand parameters -
Manual review on a small random sample
Resources
- The dataset used for the manual review will be documented in a spreadsheet (TBD)
- Link to the relevant logs (TBD)
- Prompt used for Claude 4.0 is
TBD- Prompt definitions to be updated with Claude 4.0 configuration
Conclusion
TBD - Results will be documented here after evaluation is complete
Expected Outcomes
- Validate Claude 4.0 performance against previous Claude versions
- Ensure accuracy and quality of vulnerability resolution suggestions
- Identify any model-specific issues or improvements
- Document any necessary prompt adjustments for Claude 4.0
Edited by 🤖 GitLab Bot 🤖