Ingest advisory and affected package data to DB
What does this MR do and why?
Ingest advisory and affected package data to DB
This MR is similar to Add package metadata ingestion for version form... (!120027 - merged), however, instead of ingesting license data, it ingests advisory and affected_package data.
2 tables are touched in the process of ingestion: pm_advisories and pm_affected_packages.
- Advisory data is collected from the slice of objects passed to the ingestion service and upserted into
pm_advisories. The theadvisory_xidandsource_xidkeys are used to determine whether to insert or add a new record. - An
advisory_mapofadvisory_xid => advisory database idis built for each record upserted. - Each advisory might have multiple affected packages, which we loop through and upsert into the
pm_affected_packagestable. Eachpm_affected_packagesrecord is linked to the parentadvisoryby setting thepm_affected_packages.pm_license_idvalue using theadvisory_mapfrom step2..
Database changes
This MR updates the pm_affected_packages.distro_version column to DEFAULT NOT NULL as explained in this comment.
Characteristics of ingested data
Initially, we'll only be supporting the gemnasium-db as a data source for advisories. The current size of the exported advisory data is around 30MB:
$ gsutil -m rsync -r -d gs://prod-export-advisory-bucket-1a6c642fc4de57d4 $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
du -h $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
5.0M vendor/package_metadata/advisories/v2/pypi
2.8M vendor/package_metadata/advisories/v2/go
7.7M vendor/package_metadata/advisories/v2/maven
2.4M vendor/package_metadata/advisories/v2/nuget
4.5M vendor/package_metadata/advisories/v2/packagist
716K vendor/package_metadata/advisories/v2/conan
5.1M vendor/package_metadata/advisories/v2/npm
2.1M vendor/package_metadata/advisories/v2/rubygem
30M vendor/package_metadata/advisories
Eventually, we'll support other sources of advisory data, such as trivy-db-glad which is around 360MB:
$ oras pull registry.gitlab.com/gitlab-org/security-products/dependencies/trivy-db-glad:2
$ tar -xzf db.tar.gz
$ ls -alh trivy.db
-rw------- 1 adam wheel 361M Jul 10 13:11 trivy.db
How to set up and validate locally
-
Create new directory for advisories in
$GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories:mkdir -p $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories -
Install the gsutil tool.
-
Sync package advisory bucket using
gsutil:gsutil -m rsync -r -d gs://prod-export-advisory-bucket-1a6c642fc4de57d4 $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories -
Open the rails console and start the sync process:
PM_SYNC_IN_DEV=true rails c [1] pry(main)> Feature.enable(:package_metadata_advisory_sync) [2] pry(main)> module PackageMetadata class MyAdvisoriesSyncWorker include ExclusiveLeaseGuard def lease_timeout 5.minutes end def perform try_obtain_lease do SyncService.execute(data_type: 'advisories', lease: exclusive_lease) end end end end [3] pry(main)> PackageMetadata::MyAdvisoriesSyncWorker.new.perform
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #406836 (closed)