Introduce a sync mechanism for EPSS scores
Introduction
The goal is to add EPSS support to the package_metadata flow in the GitLab backend. See an overview of the flow.
Notes
- Initially, delta mechanisms will not be used and all EPSS data will be uploaded daily to the PMDB bucket. As such, checkpoints may be redundant. See #467672 (comment 1982236484).
- The only relevant file in the bucket will be
<bucket>/v2/epss/<timestamp>/000000000.ndjson.
- The only relevant file in the bucket will be
Implementation
Overview
The flow of package_metadata on the GitLab side is:
- Cronjob executes the relevant data type worker (licenses, advisories, epss).
- The worker runs the
SyncServicewhich handles thepackage_metadataflow for each purl type. Since EPSS is its own type, we need to consider how it may look different in this area. -
SyncServiceretrieves aSyncConfigurationfor the relevant data type. -
SyncServiceuses the relevant connector (offline or GCP) to iterate over all new files (chunks) in the bucket since the last checkpoint. -
SyncServiceexecutesIngestionServicefor the given data type. - The
IngestionServiceruns a set ofIngestionTask. - Each
IngestionTaskparses and upserts the given data. - The checkpoint is updated to reflect that we have progressed and data has been ingested.
- Continue until all data has been inserted or a stop signal is received.
This issue focuses on the SyncService and SyncConfiguration which execute the ingestion.
Tasks
Sync
-
Add CVE Enrichment support to ee/app/models/package_metadata/sync_configuration.rb.-
Add cve_enrichmenttoconfigs_for. -
Implement self.cve_enrichment_configssimilar toself.advisory_configs. -
Add a cve_enrichment?function.
-
-
Add support for CVE Enrichment in ee/app/services/package_metadata/sync_service.rb.-
Add a cve_enrichmentflow underingest -
Following #467672 (comment 1982236484), return with a nil checkpoint value from checkpointto ingest all existing data.
-
-
Test! You may create a CVE Enrichment object in ee/spec/factories/package_metadatasimilarly toee/spec/factories/package_metadata/advisory_data_objects.rb.-
Add a CVE Enrichment context to ee/spec/models/package_metadata/sync_configuration_spec.rb. -
Add CVE Enrichment flows to ee/spec/services/package_metadata/sync_service_spec.rb
-
Execution
-
Create a feature flag for EPSS syncing. -
Create cve_enrichment_sync_worker.rbunderee/app/workers/package_metadata, similarly toee/app/workers/package_metadata/advisories_sync_worker.rbto execute theSyncService.-
The worker should only run if the feature flag is enabled.
-
-
Test! You may create a CVE Enrichment object in ee/spec/factories/package_metadatasimilarly toee/spec/factories/package_metadata/advisory_data_objects.rb.-
Implement ee/spec/workers/package_metadata/cve_enrichment_sync_worker_spec.rb
-
-
Add cronjob for CVE Enrichment sync worker to config/initializers/1_settings.rb, similar topackage_metadata_advisories_sync_worker. The cronjob specifies the worker to run every 5 minutes. -
Regenerate ee/app/workers/all_queues.ymlwith new cronjob changes (see sidekiq queues)
Edited by Yasha Rise