Add worker to trigger package metadata advisory sync
Why are we doing this work
Add a scheduled worker and a sync service to ingest advisories exported by License DB into the vulnerability_advisories
DB table of the main Postgres DB.
See sync protocol discussed in #394723 (comment 1318712867)
Relevant links
Non-functional requirements
- Documentation: n/a
- Feature flag: n/a
- Performance: n/a
- Testing: n/a
Version format 2
Version format 2 uses ndjson
to store data: v2/<purl_type>/<timestamp>/<chunk>.ndjson
.
Advisory data
Advisory data is stored one advisory per line. For example, { name: "rails", affected_range: ">=1.1.0 <1.1.6", identifier: "CVE-2006-4111", etc. }
.
The fields will correspond with the yaml
fields in the advisory-database
repo: https://gitlab.com/gitlab-org/security-products/gemnasium-db/-/blob/master/conan/bison/CVE-2020-14150.yml
See research spike for more info and discussion: #394723 (closed)
Full specification here: https://gitlab.com/gitlab-org/security-products/gemnasium-db/-/tree/master#yaml-schema
Implementation plan
- Work to update the connector and data object for protocol v2 is covered in Support advisories and affected packages data s... (#406323 - closed)
- Work to update ingestion is covered in Ingest advisory and affected package data into ... (#406836 - closed)
Sync worker
Syncing advisories is done in parallel because of the different database access patterns (different tables and no foreign keys to existing data).
-
add PackageMetadata::SyncAdvisoriesWorker
and extractPackageMetadata::SyncWorker
functionality intoPackageMetadata::SyncLicensesWorker
Sync service
-
PackageMetadata::SyncService add parsing of data based on data_type
.
Add feature flag
-
Add feature flag for advisory sync.
See this MR for how data parsing will change: !120795 (diffs)
Some updates to sync_service may be required depending on the changes in the above MR.