Ingest advisory and affected package data to DB
What does this MR do and why?
Ingest advisory and affected package data to DB
This MR is similar to Add package metadata ingestion for version form... (!120027 - merged), however, instead of ingesting license data, it ingests advisory
and affected_package
data.
2 tables are touched in the process of ingestion: pm_advisories
and pm_affected_packages
.
- Advisory data is collected from the slice of objects passed to the ingestion service and upserted into
pm_advisories
. The theadvisory_xid
andsource_xid
keys are used to determine whether to insert or add a new record. - An
advisory_map
ofadvisory_xid => advisory database id
is built for each record upserted. - Each advisory might have multiple affected packages, which we loop through and upsert into the
pm_affected_packages
table. Eachpm_affected_packages
record is linked to the parentadvisory
by setting thepm_affected_packages.pm_license_id
value using theadvisory_map
from step2.
.
Database changes
This MR updates the pm_affected_packages.distro_version
column to DEFAULT NOT NULL
as explained in this comment.
Characteristics of ingested data
Initially, we'll only be supporting the gemnasium-db as a data source for advisories. The current size of the exported advisory data is around 30MB
:
$ gsutil -m rsync -r -d gs://prod-export-advisory-bucket-1a6c642fc4de57d4 $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
du -h $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
5.0M vendor/package_metadata/advisories/v2/pypi
2.8M vendor/package_metadata/advisories/v2/go
7.7M vendor/package_metadata/advisories/v2/maven
2.4M vendor/package_metadata/advisories/v2/nuget
4.5M vendor/package_metadata/advisories/v2/packagist
716K vendor/package_metadata/advisories/v2/conan
5.1M vendor/package_metadata/advisories/v2/npm
2.1M vendor/package_metadata/advisories/v2/rubygem
30M vendor/package_metadata/advisories
Eventually, we'll support other sources of advisory data, such as trivy-db-glad which is around 360MB
:
$ oras pull registry.gitlab.com/gitlab-org/security-products/dependencies/trivy-db-glad:2
$ tar -xzf db.tar.gz
$ ls -alh trivy.db
-rw------- 1 adam wheel 361M Jul 10 13:11 trivy.db
How to set up and validate locally
-
Create new directory for advisories in
$GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
:mkdir -p $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
-
Install the gsutil tool.
-
Sync package advisory bucket using
gsutil
:gsutil -m rsync -r -d gs://prod-export-advisory-bucket-1a6c642fc4de57d4 $GITLAB_RAILS_ROOT_DIR/vendor/package_metadata/advisories
-
Open the rails console and start the sync process:
PM_SYNC_IN_DEV=true rails c [1] pry(main)> Feature.enable(:package_metadata_advisory_sync) [2] pry(main)> module PackageMetadata class MyAdvisoriesSyncWorker include ExclusiveLeaseGuard def lease_timeout 5.minutes end def perform try_obtain_lease do SyncService.execute(data_type: 'advisories', lease: exclusive_lease) end end end end [3] pry(main)> PackageMetadata::MyAdvisoriesSyncWorker.new.perform
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #406836 (closed)