Evaluation of Claude 4.0

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Execution Plan

Run CEF with Claude 4.0
Check that the logs show the correct prompt_version and parameters
Manual review on a small random sample

Resources

The dataset used for the manual review will be documented in a spreadsheet (TBD)
Link to the relevant logs (TBD)
Prompt used for Claude 4.0 is TBD
- Prompt definitions to be updated with Claude 4.0 configuration

Conclusion

TBD - Results will be documented here after evaluation is complete

Expected Outcomes

Validate Claude 4.0 performance against previous Claude versions
Ensure accuracy and quality of vulnerability resolution suggestions
Identify any model-specific issues or improvements
Document any necessary prompt adjustments for Claude 4.0

Edited Aug 11, 2025 by 🤖 GitLab Bot 🤖