Add service to import package metadata into the DB
Problem to solve
Data imported from the external license db needs to be stored in the instance database. The amount of data imported could be quite large and the database tables for packages, their versions, and licenses have uniqueness constraints which require handling of conflicts in order to store the created row IDs in their associated tables.
Since data sizes are quite large, a bulk insert is required.
Proposal
Create a service that can ingest package-version-license rows, batch them, and store them in the database. Ensuring that conflicts are handled correctly.
This service will be invoked by PackageMetadata::SyncService with a [package, version, license] data row. The service should indicate successful saves to the caller so that the caller can in turn update the last sync position to which the data belonged.
pm_package_versions, and pm_package_version_licenses tables store ids with which they are associated (i.e. pm_package_id). Because of this a mapping of IDs representing created packages and licenses needs to be maintained in order to populate the associated ids correctly.
Implementation Plan
-
add PackageMetadata::Import::ImportServiceunderee/app/services/package_metadata/import- allow client code to
#executeservice with a batch of data - for each tuple call ingest on the appropriate task
- allow client code to
Use Gitlab::Ingestion::BulkInsertableTask
This approach is quite a bit cleaner than alternatives, but has one caveat in that using an occurrence map of ids encountered could take up significant amounts of memory. Care must be taken to keep the id mapping below a certain limit. Perhaps by removing least recently used values. This is OK since insert ... on conflict ... return id will always give back the correct data.
-
add PackageMetadata::Import::OccurrenceMapto store mappings of ingested attributes.- {
$name: {id:$pm_packages.id,versions: {$version:$pm_package_versions.id},$license: }
- {
-
using sbom ingestion as a template, create tasks under package_metadata/import/tasksdirectory in ee/app/services-
PackageMetadata::Import::Tasks::Packagesexposingidandnameas unique attributes-
implement after_ingestto populatepackage.idback toPackageMetadata::Import::OccurrenceMap
-
-
PackageMetadata::Import::Tasks::PackageVersionsexposingid,pm_package_id, andversionas unique attributes-
implement after_ingestto populatepackage.idback toPackageMetadata::Import::OccurrenceMap
-
-
PackageMetadata::Import::Tasks::Licensesexposingidandspdx_identifieras unique attributes-
implement after_ingestto populatepackage.idback toPackageMetadata::Import::OccurrenceMap
-
-
PackageMetadata::Import::Tasks::PackageVersionLicenses
-
Note: this issue depends on Partition package metadata tables (#382567) which adds a column to 2 new tables:
- pm_package_versions gets a
purl_typecolumn - pm_package_version_licenses gets a
purl_typecolumn
Testing
Shifting left, could be tested by appropriate DB/unit tests. Not a candidate for E2E testing in isolation, however the overall new License DB would need tested.