License extraction from package metadata for dependency firewall
What does this MR do and why?
Before the Dependency Firewall can decide whether to block or warn on a package, it needs to know what licenses that package is distributed under. This MR wires up the license lookup so that when a firewall check is triggered, we go and fetch that information.
There are two new services:
FetchPackageLicensesService
This service takes a package's name, purl_type, and version and looks up its licenses in the package metadata database, backed by the pm_packages and pm_licenses tables. It uses the existing Gitlab::LicenseScanning::PackageLicenses infrastructure that the rest of the platform already relies on for license scanning.
If the package isn't in the package metadata database, or there's no data for that specific version, it returns an empty array rather than an error. That's intentional — missing license data means we can't make a policy decision, not that something went wrong.
License filtering follows the same pattern as Sbom::Ingestion::LicensesFetcher: entries are excluded if the spdx_identifier is blank (nil or empty) or matches the UNKNOWN_LICENSE sentinel that PackageLicenses uses when no data is available. Results are returned as plain Ruby hashes rather than Hashie::Mash objects, so callers get predictable symbol-keyed data regardless of what the underlying infrastructure returns.
Returned licenses look like:
[{ spdx_identifier: "Apache-2.0", name: "Apache License 2.0", url: "https://spdx.org/licenses/Apache-2.0.html" }]
EnforcementService (extended)
The enforcement service now adds more logic related to validating PURLs:
-
Validation —
PURL_REGEXPchecks that the incoming purl is well-formed per the purl-spec. The type character set ([a-zA-Z0-9.\-]) matches the ECMA-427 spec exactly. -
Parsing —
purl_type,purl_name, andpurl_versionextract the component parts needed to query the package metadata database. Percent-encoding in the name (e.g.%40for@in scoped npm packages) is handled correctly. -
License fetch — after all input validation passes, it calls
FetchPackageLicensesServiceand propagates any error back to the caller.
Currently the service returns SUCCESS_ALLOWED and does no actual logic with the response from FetchPackageLicensesService — policy evaluation comes in a follow-up issue https://gitlab.com/gitlab-org/gitlab/-/work_items/593844+s. The point of this MR is to get the license data flowing through the right path, with the right shape, ready for that evaluation step.
What this MR does NOT do
- Policy evaluation — that's the next issue
- Storing licenses on the package record — not needed, the package metadata database is the source of truth
- Adding any caching layer for db results
- Query batching
Testing
Both services have full unit test coverage:
- Valid and invalid PURLs (including edge cases like empty name segment, missing scheme, names with special characters)
- Maven and non-Maven package types, including multi-license packages
- Version-specific licenses vs. default license range
- Package not found in package metadata database → empty licenses, still succeeds
- Version out of range in package metadata database → empty licenses
-
UNKNOWN_LICENSEsentinel filtered out, blank and nilspdx_identifierfiltered out - License fetch error propagated correctly back to caller
- End-to-end: enforcement service fetches real license data from the package metadata database and returns
SUCCESS_ALLOWED
References
https://gitlab.com/gitlab-org/gitlab/-/work_items/593843+s
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #593843