Skip to content

Introduce a sync mechanism for EPSS scores

Introduction

The goal is to add EPSS support to the package_metadata flow in the GitLab backend. See an overview of the flow.

Notes

  • Initially, delta mechanisms will not be used and all EPSS data will be uploaded daily to the PMDB bucket. As such, checkpoints may be redundant. See #467672 (comment 1982236484).
    • The only relevant file in the bucket will be <bucket>/v2/epss/<timestamp>/000000000.ndjson.

Implementation

Overview

The flow of package_metadata on the GitLab side is:

  1. Cronjob executes the relevant data type worker (licenses, advisories, epss).
  2. The worker runs the SyncService which handles the package_metadata flow for each purl type. Since EPSS is its own type, we need to consider how it may look different in this area.
  3. SyncService retrieves a SyncConfiguration for the relevant data type.
  4. SyncService uses the relevant connector (offline or GCP) to iterate over all new files (chunks) in the bucket since the last checkpoint.
  5. SyncService executes IngestionService for the given data type.
  6. The IngestionService runs a set of IngestionTask.
  7. Each IngestionTask parses and upserts the given data.
  8. The checkpoint is updated to reflect that we have progressed and data has been ingested.
  9. Continue until all data has been inserted or a stop signal is received.

This issue focuses on the SyncService and SyncConfiguration which execute the ingestion.

Tasks

Sync

  • Add CVE Enrichment support to ee/app/models/package_metadata/sync_configuration.rb.
    • Add cve_enrichment to configs_for.
    • Implement self.cve_enrichment_configs similar to self.advisory_configs.
    • Add a cve_enrichment? function.
  • Add support for CVE Enrichment in ee/app/services/package_metadata/sync_service.rb.
    • Add a cve_enrichment flow under ingest
    • Following #467672 (comment 1982236484), return with a nil checkpoint value from checkpoint to ingest all existing data.
  • Test! You may create a CVE Enrichment object in ee/spec/factories/package_metadata similarly to ee/spec/factories/package_metadata/advisory_data_objects.rb.
    • Add a CVE Enrichment context to ee/spec/models/package_metadata/sync_configuration_spec.rb.
    • Add CVE Enrichment flows to ee/spec/services/package_metadata/sync_service_spec.rb

Execution

  • Create a feature flag for EPSS syncing.
  • Create cve_enrichment_sync_worker.rb under ee/app/workers/package_metadata, similarly to ee/app/workers/package_metadata/advisories_sync_worker.rb to execute the SyncService.
    • The worker should only run if the feature flag is enabled.
  • Test! You may create a CVE Enrichment object in ee/spec/factories/package_metadata similarly to ee/spec/factories/package_metadata/advisory_data_objects.rb.
    • Implement ee/spec/workers/package_metadata/cve_enrichment_sync_worker_spec.rb
  • Add cronjob for CVE Enrichment sync worker to config/initializers/1_settings.rb, similar to package_metadata_advisories_sync_worker. The cronjob specifies the worker to run every 5 minutes.
  • Regenerate ee/app/workers/all_queues.yml with new cronjob changes (see sidekiq queues)
Edited by Yasha Rise