Add violation_data to scan_result_policy_violations

Why are we doing this work

Based on the findings of Spike: Investigate API improvements to provide ... (#426582 - closed), we want to extend the scan_result_policy_violations table to store violation_data.

In this data, we want to persist additional information that helps user understand what exactly caused policy violations in an MR. This would include a list of violations for each policy type, or error that prevented the policy from being applied / evaluated.

Relevant links

Non-functional requirements

Documentation:
Feature flag:
Performance:
Testing:

Implementation plan

Add column violation_data:

add_column :scan_result_policy_violations, :violation_data, :jsonb, null: true

Add schema for this column

class Security::ScanResultPolicyViolation
  validates :violation_data, json_schema: { filename: "filename" }
end

Extend UpdateViolationsService to be able to set the violation_data for a given scan_result_policy_id:

diff --git a/ee/app/services/security/security_orchestration_policies/update_violations_service.rb b/ee/app/services/security/security_orchestration_policies/update_violations_service.rb
index de03c3f7ebd709095ea85adb429ae975ec7df10a..f316802c0e208bd7a8ab32a1cc8bcf2d3ab8e083 100644
--- a/ee/app/services/security/security_orchestration_policies/update_violations_service.rb
+++ b/ee/app/services/security/security_orchestration_policies/update_violations_service.rb
@@ -5,12 +5,14 @@ module SecurityOrchestrationPolicies
   class UpdateViolationsService
     attr_reader :merge_request,
       :violated_policy_ids,
-        :unviolated_policy_ids
+        :unviolated_policy_ids,
+        :violation_data

     def initialize(merge_request)
       @merge_request = merge_request
       @violated_policy_ids = Set.new
       @unviolated_policy_ids = Set.new
+        @violation_data = {}
     end

     def add(violated_ids, unviolated_ids)
@@ -18,6 +20,10 @@ def add(violated_ids, unviolated_ids)
       unviolated_policy_ids.merge(unviolated_ids)
     end

+      def set_violation_data(policy_id, data)
+        @violation_data[policy_id] = data
+      end
+
     def execute
       return [violated_policy_ids.clear, unviolated_policy_ids.clear] unless Feature.enabled?(
         :scan_result_any_merge_request, merge_request.project)
@@ -40,10 +46,13 @@ def delete_violations

     def create_violations
       attrs = violated_policy_ids.map do |id|
-          { scan_result_policy_id: id, merge_request_id: merge_request.id, project_id: merge_request.project_id }
+          { scan_result_policy_id: id, merge_request_id: merge_request.id, project_id: merge_request.project_id,
+            violation_data: violation_data[id] }
       end

-        Security::ScanResultPolicyViolation.insert_all(attrs) if attrs.any?
+        return unless attrs.any?
+
+        Security::ScanResultPolicyViolation.upsert_all(attrs, unique_by: %w[scan_result_policy_id merge_request_id])
     end
   end
 end

Proposal for the schema based on what we want to be able to store:

Violations:
- scan_finding rules: newly detected and previously existing uuid that caused violation. Example:
```
{"violations": {"uuids": {"newly_detected": ["48cf6bb7-3c8b-5bad-a07a-0104d50ab74f"], "previously_existing": ["6885fb09-428e-5471-ab39-762366eefcba"]}}}
```
- license_scanning rules: violated license names. Example:
```
{"violations": {"licenses": ["MIT"]}
```
- any_merge_request rules: whether unsigned or any commit caused the violation and approvals to be required
```
{"violations": {"commits": "unsigned"}}
```
Errors:
- scan removed. Example:
```
{"error": "SCAN_REMOVED"}
```
- scan unenforceable because pipelines are not configured

Verification steps

Edited Dec 11, 2023 by Martin Čavoj