Add highest version support when querying compressed package metadata

Problem to solve

As discussed here, we had to disable support for checking the highest_version field when querying compressed license information because semver_dialects doesn't tell if a version is invalid. This can result in misclassifications where an invalid version gets licenses from the default license set.

Further details

The following tables distinguish b/w 3 possible classification errors when comparing to the highest_version field.

Problem Example Solutions
invalid version compared to highest version dev-master check version using regex or other
error when comparing to highest version pre-releases, RC, etc. use the same comparison logic on exporter and backend
error when comparing to lowest version same use the same comparison logic on exporter and backend

There are 2 possible misclassifications:

  • default license set instead of unknown
  • unknown instead of default license set

Right now default license set instead of unknown is the only possible misclassification related to this issue.

As of today the exact impact of these misclassifications is unknown.

Proposal

  1. Align exporter and monolith regular expressions

  2. Re-add #unsupported_version method

Implementation plan

  1. Add new IDX constraints for lowest and highest versions in ee/app/models/package_metadata/package.rb

  2. Add new #lowest and #highest attribute methods in ee/app/models/package_metadata/package.rb

  3. Add regular expression from exporter as a constant (VERSION_REGEXP_RAW) in ee/app/models/package_metadata/package.rb

  4. Check regular expression in license_ids_for, returning [] when it doesn't match in ee/app/models/package_metadata/package.rb

  5. Add #unsupported_version? and check version, returning [] when it is out of bounds in ee/app/models/package_metadata/package.rb

  6. Add new test cases to ee/spec/models/package_metadata/package_spec.rb and fix any broken specs

  7. Re-add the guard clause return if unsupported_version?(version) that was removed here.

From ecb2423806cfe714722ff0844f8118dcf260093c Mon Sep 17 00:00:00 2001
From: Philip Cunningham <pcunningham@gitlab.com>
Date: Mon, 2 Oct 2023 11:52:34 +0100
Subject: [PATCH] Check supported versions when querying compressed package
 metadata

- Adds new IDX constants
- Adds exporter regular expression
- Adds version bound check
---
 ee/app/models/package_metadata/package.rb     | 37 ++++++++++++++++++-
 .../license_scanning/package_licenses_spec.rb | 10 +----
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/ee/app/models/package_metadata/package.rb b/ee/app/models/package_metadata/package.rb
index 5538c17de524..12bb277f3cd2 100644
--- a/ee/app/models/package_metadata/package.rb
+++ b/ee/app/models/package_metadata/package.rb
@@ -3,7 +3,13 @@
 module PackageMetadata
   class Package < ApplicationRecord
     DEFAULT_LICENSES_IDX = 0
-    OTHER_LICENSES_IDX   = 3
+    LOWEST_VERSION_IDX = 1
+    HIGHEST_VERSION_IDX = 2
+    OTHER_LICENSES_IDX = 3
+
+    # The regular expression used by the exporter
+    # https://pkg.go.dev/github.com/hashicorp/go-version#pkg-constants
+    VERSION_REGEXP_RAW = /.../
 
     has_many :package_versions, inverse_of: :package, foreign_key: :pm_package_id
 
@@ -16,13 +22,34 @@ class Package < ApplicationRecord
     validates :licenses, json_schema: { filename: 'pm_package_licenses' }, if: -> { licenses.present? }
 
     def license_ids_for(version:)
+      # if the given version is greater than the highest known version or lower
+      # than the lowest known version, then the version is not supported, in
+      # which case we return an empty array.
+      return [] if unsupported_version?(version)
+
       # if the package metadata has not yet completed syncing, then the licenses field
       # might be nil, in which case we return an empty array.
       return [] if licenses.blank?
 
+      # if the version does not match the exporter format, return an empty array.
+      # https://gitlab.com/gitlab-org/gitlab/-/issues/410434
+      return [] unless VERSION_REGEXP_RAW.match(version)
+
       matching_other_license_ids(version: version) || default_license_ids
     end
 
+    def unsupported_version?(input_version)
+      interval = VersionParser.parse("=#{input_version}")
+
+      lower = VersionRange.new
+      lower.add(VersionParser.parse("<#{lowest_version}"))
+
+      higher = VersionRange.new
+      higher.add(VersionParser.parse(">#{highest_version}"))
+
+      lower.overlaps_with?(interval) || higher.overlaps_with?(interval)
+    end
+
     # Takes an array of components and uses the purl_type and name fields to search for matching
     # packages
     def self.packages_for(components:)
@@ -54,6 +81,14 @@ def default_license_ids
       licenses[DEFAULT_LICENSES_IDX]
     end
 
+    def lowest_version
+      licenses[LOWEST_VERSION_IDX]
+    end
+
+    def highest_version
+      licenses[HIGHEST_VERSION_IDX]
+    end
+
     def other_licenses
       licenses[OTHER_LICENSES_IDX]
     end
diff --git a/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb b/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb
index 2fd9113d3435..923d99c3034b 100644
--- a/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb
+++ b/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb
@@ -393,9 +393,7 @@
         where(:case_name, :name, :purl_type, :version) do
           "name does not match"      | "does-not-match" | "golang" | "v1.10.0"
           "purl_type does not match" | "beego"          | "npm"    | "v1.10.0"
-          # TODO: re-enable the following when https://gitlab.com/gitlab-org/vulnerability-research/foss/semver_dialects/-/issues/3
-          # has been completed.
-          # "version is invalid"     | "beego"          | "golang" | "invalid-version"
+          "version is invalid"       | "beego"          | "golang" | "invalid-version"
         end
 
         with_them do
@@ -426,11 +424,7 @@
           it "returns the default licenses" do
             expect(fetch).to eq([
               "name" => "beego", "purl_type" => "golang", "version" => "invalid-version", "path" => "",
-              "licenses" => [{
-                "name" => "Default License 2.1",
-                "spdx_identifier" => "DEFAULT-2.1",
-                "url" => "https://spdx.org/licenses/DEFAULT-2.1.html"
-              }]
+              "licenses" => [{ "name" => "unknown", "spdx_identifier" => "unknown", "url" => nil }]
             ])
           end
         end
-- 
2.42.0

Note

Please note that the proposed logic depicted in the patch above should be placed within default_license_ids so that non-compliant versions that are valid are still handled.

/cc @gonzoyumo @smeadzinger @fcatteau @ifrenkel

Edited by Philip Cunningham