Add highest version support when querying compressed package metadata
Problem to solve
As discussed here, we had to disable support for checking the highest_version
field when querying compressed license information because semver_dialects
doesn't tell if a version is invalid. This can result in misclassifications where an invalid version gets licenses from the default license set.
Further details
The following tables distinguish b/w 3 possible classification errors when comparing to the highest_version
field.
Problem | Example | Solutions |
---|---|---|
invalid version compared to highest version | dev-master |
check version using regex or other |
error when comparing to highest version | pre-releases, RC, etc. | use the same comparison logic on exporter and backend |
error when comparing to lowest version | same | use the same comparison logic on exporter and backend |
There are 2 possible misclassifications:
- default license set instead of unknown
- unknown instead of default license set
Right now default license set instead of unknown
is the only possible misclassification related to this issue.
As of today the exact impact of these misclassifications is unknown.
Proposal
-
Align exporter and monolith regular expressions
-
Re-add
#unsupported_version
method
Implementation plan
-
Add new IDX constraints for lowest and highest versions in
ee/app/models/package_metadata/package.rb
-
Add new
#lowest
and#highest
attribute methods inee/app/models/package_metadata/package.rb
-
Add regular expression from exporter as a constant (
VERSION_REGEXP_RAW
) inee/app/models/package_metadata/package.rb
-
Check regular expression in
license_ids_for
, returning[]
when it doesn't match inee/app/models/package_metadata/package.rb
-
Add
#unsupported_version?
and check version, returning[]
when it is out of bounds inee/app/models/package_metadata/package.rb
-
Add new test cases to
ee/spec/models/package_metadata/package_spec.rb
and fix any broken specs -
Re-add the guard clause return if unsupported_version?(version)
that was removed here.
From ecb2423806cfe714722ff0844f8118dcf260093c Mon Sep 17 00:00:00 2001
From: Philip Cunningham <pcunningham@gitlab.com>
Date: Mon, 2 Oct 2023 11:52:34 +0100
Subject: [PATCH] Check supported versions when querying compressed package
metadata
- Adds new IDX constants
- Adds exporter regular expression
- Adds version bound check
---
ee/app/models/package_metadata/package.rb | 37 ++++++++++++++++++-
.../license_scanning/package_licenses_spec.rb | 10 +----
2 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/ee/app/models/package_metadata/package.rb b/ee/app/models/package_metadata/package.rb
index 5538c17de524..12bb277f3cd2 100644
--- a/ee/app/models/package_metadata/package.rb
+++ b/ee/app/models/package_metadata/package.rb
@@ -3,7 +3,13 @@
module PackageMetadata
class Package < ApplicationRecord
DEFAULT_LICENSES_IDX = 0
- OTHER_LICENSES_IDX = 3
+ LOWEST_VERSION_IDX = 1
+ HIGHEST_VERSION_IDX = 2
+ OTHER_LICENSES_IDX = 3
+
+ # The regular expression used by the exporter
+ # https://pkg.go.dev/github.com/hashicorp/go-version#pkg-constants
+ VERSION_REGEXP_RAW = /.../
has_many :package_versions, inverse_of: :package, foreign_key: :pm_package_id
@@ -16,13 +22,34 @@ class Package < ApplicationRecord
validates :licenses, json_schema: { filename: 'pm_package_licenses' }, if: -> { licenses.present? }
def license_ids_for(version:)
+ # if the given version is greater than the highest known version or lower
+ # than the lowest known version, then the version is not supported, in
+ # which case we return an empty array.
+ return [] if unsupported_version?(version)
+
# if the package metadata has not yet completed syncing, then the licenses field
# might be nil, in which case we return an empty array.
return [] if licenses.blank?
+ # if the version does not match the exporter format, return an empty array.
+ # https://gitlab.com/gitlab-org/gitlab/-/issues/410434
+ return [] unless VERSION_REGEXP_RAW.match(version)
+
matching_other_license_ids(version: version) || default_license_ids
end
+ def unsupported_version?(input_version)
+ interval = VersionParser.parse("=#{input_version}")
+
+ lower = VersionRange.new
+ lower.add(VersionParser.parse("<#{lowest_version}"))
+
+ higher = VersionRange.new
+ higher.add(VersionParser.parse(">#{highest_version}"))
+
+ lower.overlaps_with?(interval) || higher.overlaps_with?(interval)
+ end
+
# Takes an array of components and uses the purl_type and name fields to search for matching
# packages
def self.packages_for(components:)
@@ -54,6 +81,14 @@ def default_license_ids
licenses[DEFAULT_LICENSES_IDX]
end
+ def lowest_version
+ licenses[LOWEST_VERSION_IDX]
+ end
+
+ def highest_version
+ licenses[HIGHEST_VERSION_IDX]
+ end
+
def other_licenses
licenses[OTHER_LICENSES_IDX]
end
diff --git a/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb b/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb
index 2fd9113d3435..923d99c3034b 100644
--- a/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb
+++ b/ee/spec/lib/gitlab/license_scanning/package_licenses_spec.rb
@@ -393,9 +393,7 @@
where(:case_name, :name, :purl_type, :version) do
"name does not match" | "does-not-match" | "golang" | "v1.10.0"
"purl_type does not match" | "beego" | "npm" | "v1.10.0"
- # TODO: re-enable the following when https://gitlab.com/gitlab-org/vulnerability-research/foss/semver_dialects/-/issues/3
- # has been completed.
- # "version is invalid" | "beego" | "golang" | "invalid-version"
+ "version is invalid" | "beego" | "golang" | "invalid-version"
end
with_them do
@@ -426,11 +424,7 @@
it "returns the default licenses" do
expect(fetch).to eq([
"name" => "beego", "purl_type" => "golang", "version" => "invalid-version", "path" => "",
- "licenses" => [{
- "name" => "Default License 2.1",
- "spdx_identifier" => "DEFAULT-2.1",
- "url" => "https://spdx.org/licenses/DEFAULT-2.1.html"
- }]
+ "licenses" => [{ "name" => "unknown", "spdx_identifier" => "unknown", "url" => nil }]
])
end
end
--
2.42.0
Note
Please note that the proposed logic depicted in the patch above should be placed within default_license_ids
so that non-compliant versions that are valid are still handled.