Introduce a sync mechanism for EPSS scores
Introduction
The goal is to add EPSS support to the package_metadata
flow in the GitLab backend. See an overview of the flow.
Notes
- Initially, delta mechanisms will not be used and all EPSS data will be uploaded daily to the PMDB bucket. As such, checkpoints may be redundant. See #467672 (comment 1982236484).
- The only relevant file in the bucket will be
<bucket>/v2/epss/<timestamp>/000000000.ndjson
.
- The only relevant file in the bucket will be
Implementation
Overview
The flow of package_metadata
on the GitLab side is:
- Cronjob executes the relevant data type worker (licenses, advisories, epss).
- The worker runs the
SyncService
which handles thepackage_metadata
flow for each purl type. Since EPSS is its own type, we need to consider how it may look different in this area. -
SyncService
retrieves aSyncConfiguration
for the relevant data type. -
SyncService
uses the relevant connector (offline or GCP) to iterate over all new files (chunks) in the bucket since the last checkpoint. -
SyncService
executesIngestionService
for the given data type. - The
IngestionService
runs a set ofIngestionTask
. - Each
IngestionTask
parses and upserts the given data. - The checkpoint is updated to reflect that we have progressed and data has been ingested.
- Continue until all data has been inserted or a stop signal is received.
This issue focuses on the SyncService
and SyncConfiguration
which execute the ingestion.
Tasks
Sync
-
Add CVE Enrichment support to ee/app/models/package_metadata/sync_configuration.rb
.-
Add cve_enrichment
toconfigs_for
. -
Implement self.cve_enrichment_configs
similar toself.advisory_configs
. -
Add a cve_enrichment?
function.
-
-
Add support for CVE Enrichment in ee/app/services/package_metadata/sync_service.rb
.-
Add a cve_enrichment
flow underingest
-
Following #467672 (comment 1982236484), return with a nil checkpoint value from checkpoint
to ingest all existing data.
-
-
Test! You may create a CVE Enrichment object in ee/spec/factories/package_metadata
similarly toee/spec/factories/package_metadata/advisory_data_objects.rb
.-
Add a CVE Enrichment context to ee/spec/models/package_metadata/sync_configuration_spec.rb
. -
Add CVE Enrichment flows to ee/spec/services/package_metadata/sync_service_spec.rb
-
Execution
-
Create a feature flag for EPSS syncing. -
Create cve_enrichment_sync_worker.rb
underee/app/workers/package_metadata
, similarly toee/app/workers/package_metadata/advisories_sync_worker.rb
to execute theSyncService
.-
The worker should only run if the feature flag is enabled.
-
-
Test! You may create a CVE Enrichment object in ee/spec/factories/package_metadata
similarly toee/spec/factories/package_metadata/advisory_data_objects.rb
.-
Implement ee/spec/workers/package_metadata/cve_enrichment_sync_worker_spec.rb
-
-
Add cronjob for CVE Enrichment sync worker to config/initializers/1_settings.rb
, similar topackage_metadata_advisories_sync_worker
. The cronjob specifies the worker to run every 5 minutes. -
Regenerate ee/app/workers/all_queues.yml
with new cronjob changes (see sidekiq queues)
Edited by Yasha Rise