Add worker to trigger package metadata advisory sync

Why are we doing this work

Add a scheduled worker and a sync service to ingest advisories exported by License DB into the vulnerability_advisories DB table of the main Postgres DB.

See sync protocol discussed in #394723 (comment 1318712867)

Relevant links

Non-functional requirements

  • Documentation: n/a
  • Feature flag: n/a
  • Performance: n/a
  • Testing: n/a

Version format 2

Version format 2 uses ndjson to store data: v2/<purl_type>/<timestamp>/<chunk>.ndjson.

Advisory data

Advisory data is stored one advisory per line. For example, { name: "rails", affected_range: ">=1.1.0 <1.1.6", identifier: "CVE-2006-4111", etc. }.

The fields will correspond with the yaml fields in the advisory-database repo: https://gitlab.com/gitlab-org/security-products/gemnasium-db/-/blob/master/conan/bison/CVE-2020-14150.yml

See research spike for more info and discussion: #394723 (closed)

Full specification here: https://gitlab.com/gitlab-org/security-products/gemnasium-db/-/tree/master#yaml-schema

Implementation plan

Sync worker

Syncing advisories is done in parallel because of the different database access patterns (different tables and no foreign keys to existing data).

  • add PackageMetadata::SyncAdvisoriesWorker and extract PackageMetadata::SyncWorker functionality into PackageMetadata::SyncLicensesWorker
    • add cron entry for this worker (similar to the one for existing)
    • add queue entry for this worker (similar to existing)

Sync service

Add feature flag

  • Add feature flag for advisory sync.

See this MR for how data parsing will change: !120795 (diffs)

Some updates to sync_service may be required depending on the changes in the above MR.

Edited by Igor Frenkel