Skip to content

Improve sorting on `MergeReportsService`

Why are we doing this work

Currently, we are sorting the artifacts(only the dependency_scanning ones) in the MergeReportsService to have consistent report results regardless of the order of the artifacts created in the database. But this logic runs only for the dependency_scanning artifacts which can lead to inconsistent results for the other type of scans if the users are using more than one scanner for the same scan type.

My proposal is using the same logic we have in the StoreGroupedScans service which also sorts the artifacts based on the job names which are unique.

artifacts.sort_by { |artifact| [scanner_order_for(artifact), artifact.job.name] }

def scanner_order_for(artifact)
  MergeReportsService::ANALYZER_ORDER.fetch(artifact.security_report.primary_scanner&.external_id, Float::INFINITY)
end

If we use the same logic in both the MergeReportsService and StoreGroupedScans service, we can encapsulate this logic in one place.

Implementation plan

  • I believe the best place to encapsulate the analyzer priority is the Gitlab::Ci::Reports::Security::Scanner therefore we should

    • Move the ANALYZER_ORDER constant from the Security::MergeReportsService to Gitlab::Ci::Reports::Security::Scanner
    • Implement the method called priority which returns the priority of a scanner by using the ANALYZER_ORDER constant
  • The best place to encapsulate the sorting logic is the report entity itself therefore we should implement <=> method in the Gitlab::Ci::Reports::Security::Report then we can just use source_reports.sort! to sort the reports based on priority. This method should use the priority of the first scanner and job name to make the sorting stable.

A rough implementation would look something like this;
--- a/ee/app/services/security/merge_reports_service.rb
+++ b/ee/app/services/security/merge_reports_service.rb
@@ -2,19 +2,10 @@

 module Security
   class MergeReportsService
-    ANALYZER_ORDER = {
-      "bundler_audit" => 1,
-      "retire.js" =>  2,
-      "gemnasium" => 3,
-      "gemnasium-maven" => 3,
-      "gemnasium-python" => 3,
-      "unknown" => 999
-    }.freeze
-
     def initialize(*source_reports)
       @source_reports = source_reports
-      # temporary sort https://gitlab.com/gitlab-org/gitlab/-/issues/213839
-      sort_by_ds_analyzers!
+      @source_reports.sort!
+
       @target_report = ::Gitlab::Ci::Reports::Security::Report.new(
         @source_reports.first.type,
         @source_reports.first.pipeline,
@@ -84,19 +75,5 @@ def copy_findings_to_target

       @findings.each { |finding| @target_report.add_finding(finding) }
     end
-
-    def sort_by_ds_analyzers!
-      return if @source_reports.any? { |x| x.type != :dependency_scanning }
-
-      @source_reports.sort! do |a, b|
-        a_scanner_id, b_scanner_id = a.scanners.values[0].external_id, b.scanners.values[0].external_id
-
-        # for custom analyzers
-        a_scanner_id = "unknown" if ANALYZER_ORDER[a_scanner_id].nil?
-        b_scanner_id = "unknown" if ANALYZER_ORDER[b_scanner_id].nil?
-
-        ANALYZER_ORDER[a_scanner_id] <=> ANALYZER_ORDER[b_scanner_id]
-      end
-    end
   end
 end
diff --git a/ee/app/services/security/store_grouped_scans_service.rb b/ee/app/services/security/store_grouped_scans_service.rb
index a365edc8df8..7ba13db2805 100644
--- a/ee/app/services/security/store_grouped_scans_service.rb
+++ b/ee/app/services/security/store_grouped_scans_service.rb
@@ -43,14 +43,7 @@ def report_type
     end

     def sorted_artifacts
-      @sorted_artifacts ||= artifacts.sort_by { |artifact| [scanner_order_for(artifact), artifact.job.name] }
-    end
-
-    # This method returns the priority of scanners for dependency_scanning
-    # and `INFINITY` for all the other scan types. There is no problem with
-    # calling this method for all the scan types to get rid of branching.
-    def scanner_order_for(artifact)
-      MergeReportsService::ANALYZER_ORDER.fetch(artifact.security_report.primary_scanner&.external_id, Float::INFINITY)
+      @sorted_artifacts ||= artifacts.sort_by(&:security_report)
     end

     def store_scan_for(artifact, deduplicate)
diff --git a/ee/lib/gitlab/ci/reports/security/report.rb b/ee/lib/gitlab/ci/reports/security/report.rb
index 5665478ebca..41779fee872 100644
--- a/ee/lib/gitlab/ci/reports/security/report.rb
+++ b/ee/lib/gitlab/ci/reports/security/report.rb
@@ -57,6 +57,10 @@ def merge!(other)
           def primary_scanner
             scanners.first&.second
           end
+
+          def <=>(other)
+            primary_scanner.priority <=> other.primary_scanner.priority
+          end
         end
       end
     end
diff --git a/ee/lib/gitlab/ci/reports/security/scanner.rb b/ee/lib/gitlab/ci/reports/security/scanner.rb
index 43ffdda5fb7..052771419a6 100644
--- a/ee/lib/gitlab/ci/reports/security/scanner.rb
+++ b/ee/lib/gitlab/ci/reports/security/scanner.rb
@@ -5,6 +5,15 @@ module Ci
     module Reports
       module Security
         class Scanner
+          ANALYZER_ORDER = {
+            "bundler_audit" => 1,
+            "retire.js" =>  2,
+            "gemnasium" => 3,
+            "gemnasium-maven" => 3,
+            "gemnasium-python" => 3,
+            "unknown" => 999
+          }.freeze
+
           attr_accessor :external_id, :name, :vendor

           def initialize(external_id:, name:, vendor:)
@@ -28,6 +37,10 @@ def to_hash
           def ==(other)
             other.external_id == external_id
           end
+
+          def priority
+            ANALYZER_ORDER.fetch(name, Float::INFINITY)
+          end
         end
       end
     end
Edited by Mehmet Emin INAC