License extraction from package metadata for dependency firewall

What does this MR do and why?

Before the Dependency Firewall can decide whether to block or warn on a package, it needs to know what licenses that package is distributed under. This MR wires up the license lookup so that when a firewall check is triggered, we go and fetch that information.

There are two new services:

FetchPackageLicensesService

This service takes a package's name, purl_type, and version and looks up its licenses in the package metadata database, backed by the pm_packages and pm_licenses tables. It uses the existing Gitlab::LicenseScanning::PackageLicenses infrastructure that the rest of the platform already relies on for license scanning.

If the package isn't in the package metadata database, or there's no data for that specific version, it returns an empty array rather than an error. That's intentional — missing license data means we can't make a policy decision, not that something went wrong.

License filtering follows the same pattern as Sbom::Ingestion::LicensesFetcher: entries are excluded if the spdx_identifier is blank (nil or empty) or matches the UNKNOWN_LICENSE sentinel that PackageLicenses uses when no data is available. Results are returned as plain Ruby hashes rather than Hashie::Mash objects, so callers get predictable symbol-keyed data regardless of what the underlying infrastructure returns.

Returned licenses look like:

[{ spdx_identifier: "Apache-2.0", name: "Apache License 2.0", url: "https://spdx.org/licenses/Apache-2.0.html" }]

EnforcementService (extended)

The enforcement service now adds more logic related to validating PURLs:

  • ValidationPURL_REGEXP checks that the incoming purl is well-formed per the purl-spec. The type character set ([a-zA-Z0-9.\-]) matches the ECMA-427 spec exactly.
  • Parsingpurl_type, purl_name, and purl_version extract the component parts needed to query the package metadata database. Percent-encoding in the name (e.g. %40 for @ in scoped npm packages) is handled correctly.
  • License fetch — after all input validation passes, it calls FetchPackageLicensesService and propagates any error back to the caller.

Currently the service returns SUCCESS_ALLOWED and does no actual logic with the response from FetchPackageLicensesService — policy evaluation comes in a follow-up issue https://gitlab.com/gitlab-org/gitlab/-/work_items/593844+s. The point of this MR is to get the license data flowing through the right path, with the right shape, ready for that evaluation step.

What this MR does NOT do

  • Policy evaluation — that's the next issue
  • Storing licenses on the package record — not needed, the package metadata database is the source of truth
  • Adding any caching layer for db results
  • Query batching

Testing

Both services have full unit test coverage:

  • Valid and invalid PURLs (including edge cases like empty name segment, missing scheme, names with special characters)
  • Maven and non-Maven package types, including multi-license packages
  • Version-specific licenses vs. default license range
  • Package not found in package metadata database → empty licenses, still succeeds
  • Version out of range in package metadata database → empty licenses
  • UNKNOWN_LICENSE sentinel filtered out, blank and nil spdx_identifier filtered out
  • License fetch error propagated correctly back to caller
  • End-to-end: enforcement service fetches real license data from the package metadata database and returns SUCCESS_ALLOWED

References

https://gitlab.com/gitlab-org/gitlab/-/work_items/593843+s

Screenshots or screen recordings

Before After

How to set up and validate locally

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #593843

Edited by Hannah Baker

Merge request reports

Loading