Update package metadata license lookup to use deduplicated data
Why are we doing this work
A new compressed dataset will be added as part of this issue's parent epic. The compressed dataset is meant to replace the existing dataset (though not in this epic). Therefore, the current license lookup provided by Gitlab::LicenseScanning::PackageLicenses needs to be updated to query this dataset.
Note: until existing dataset is removed, Gitlab::LicenseScanning::PackageLicenses we need to support querying both compressed and uncompressed data.
Relevant links
- Research spike: #407454 (closed)
- Detailed discussion of
licensesdata structure: #408901 (closed)
Implementation plan
Update Gitlab::LicenseScanning::PackageLicenses:
-
Add a new feature flag called compressed_package_metadata_query.[Feature flag] Rollout of `compressed_package_m... (#409793 - closed)
-
Create two new private methods: -
uncompressed_fetchThis contains the code currently in the fetch method
-
compressed_fetchThis is responsible for querying data from the new
licensesfield in thepm_packagestable usingcomponentsusing the following pseudocode:def compressed_fetch components.each do |component| packages = select packages from pm_packages table where pm_packages.name = component.name and pm_packages.purl_type = component.purl_type packages.each do |package| licenses = [] if component.version is contained in package.licenses.other_versions licenses = package.licenses.other_licenses else licenses = package.licenses.default_licenses end add_record_with_known_licenses(package.purl_type, package.name, component.version, licenses) end end add_records_with_unknown_licenses end
Update package metadata license lookup to use c... (!119607 - merged)
-
-
update fetch: if the `compressed_package_metadata_query` feature is enabled call `compressed_fetch` else call `uncompressed_fetch` endUpdate package metadata license lookup to use c... (!119607 - merged)
-
Update all tests to check both sides of compressed_package_metadata_queryfeature flag:Test both sides of compressed_package_metadata_... (!120207 - merged)
Verification steps
Test data to be determined.
- redundant case
- insert same test data into both datasets
- assert that
uncompressed_fetchandcompressed_fetchreturn the same result, regardless of the setting forcompressed_package_metadata_query
- new instance case
- insert only into new dataset
- assert that
fetchreturns data correctly whencompressed_package_metadata_queryis enabled