License Scanning using License DB and SBOM components
<!-- Implementation issues are used break-up a large piece of work into small, discrete tasks that can move independently through the build workflow steps. They're typically used to populate a Feature Epic. Once created, an implementation issue is usually refined in order to populate and review the implementation plan and weight. Example workflow: https://about.gitlab.com/handbook/engineering/development/threat-management/planning/diagram.html#plan --> **NOTE**: This epic was created before implementing the License Scanning classes in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/105747, and so the wording might no longer accurate. ## Why are we doing this work The `LicenseScanning` service implemented in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/103574 needs to be changed in order to replace the License Scanning CI job, as captured by https://gitlab.com/groups/gitlab-org/-/epics/8072. - Legacy beahvior: It parses legacy License Scanning JSON artifacts. - New behavior: It matches SBOM components detected in the project branch with license data imported from the License DB. ## Further details In the new behavior, License Scanning does the following: 1. Get SBOM components for the given pipeline or the default branch. 1. Query the DB to fetch licenses of these components. https://gitlab.com/gitlab-org/gitlab/-/issues/373163 defines that data is stored. 1. Build a new `Report` using its methods, and return it. It returns a `Report` model, just like the legacy behavior implemented in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/103574+. The new behavior is behind a feature flag. Also, License Scanning falls back to the legacy behavior when no SBOM data is available. ### SBOM components In order to get SBOM components of a branch or pipeline, the License Scanning service delegates to one of these: - if available, an existing finder implemented by gitlab~10690752 as part of of https://gitlab.com/groups/gitlab-org/-/epics/8293+ - a simplified finder implemented by gitlab~10690742, which uses ActiveRecord relations; it doesn't cover edge-cases, and it will be replaced by a complete implementation as ~"group::threat insights" moves forward with https://gitlab.com/groups/gitlab-org/-/epics/8293 See https://gitlab.com/groups/gitlab-org/-/epics/8532#note_1160236461 and following comments. ### Names of licenses In this new implementation, the `Report` model doesn't provide the names of the SPDX licenses that have been detected; `#license_names` returns nothing. However, this should be fully compatible with the current implementation of license policies. That's because the Managed API reuses existing records of `software_licenses`, and that table is in sync with the SPDX License List. See https://gitlab.com/gitlab-org/gitlab/-/issues/379137#note_1160165965 ## Relevant links - https://gitlab.com/gitlab-org/gitlab/-/issues/373163+ - https://gitlab.com/groups/gitlab-org/-/epics/8532+ ## Non-functional requirements <!-- Add details for required items and delete others. --> - [ ] Documentation: - [ ] Feature flag: - [ ] Performance: - [ ] Testing: ## Implementation plan The following steps can be implemented in separate MRs: 1. Create a finder that gets the [SBOM components](#SBOM-components). - It takes a project branch or CI pipeline. - It returns tuples of PURL type, package name, and package version.. 1. Create a finder that gets licenses of package versions. - It queries the DB tables implemented in https://gitlab.com/gitlab-org/gitlab/-/issues/373163+. - It takes tuples of PURL type, package name, and package version. - It returns the corresponding licenses. 1. Move the existing License Scanning service (legacy behavior), and change the existing License Scanning service to that it delegates to it, acting as a simple proxy. 1. Create a new License Scanning service class that implements the new behavior. It uses the two aforementioned finders. 1. Change the License Scanning proxy to switch to new behavior when it's enabled by a Feature Flag. 1. Change the License Scanning proxy to fallback to legacy behavior when project isn't compatible with new behavior, even though the FF enables the new behavior. **Enabling the feature flag is out of scope** because we should probably wait until the License DB covers all the package types, and that the backend is automatically synced with License DB. See following epics: - https://gitlab.com/groups/gitlab-org/-/epics/8492+ - https://gitlab.com/groups/gitlab-org/-/epics/9349+ ## Verification steps Check new behavior: 1. Enable feature flag. 2. Set up a project supported by SBOM generators or Dependency Scanning, and using package types supported by License DB. 3. Add corresponding SBOM generators or Dependency Scanning to the CI config. 4. Trigger a pipeline. 5. Check licenses in `License Compliance` page. Check fallback to legacy behavior: 1. Enable feature flag. 2. Set up a project supported by legacy License Scanning job. 3. Include legacy CI template for License Scanning. 4. Trigger a pipeline. 5. Check licenses in `License Compliance` page. /cc @ifrenkel @hacks4oats
epic