Skip to content

Ingest advisory and affected package data into the database

Why are we doing this work

The work in this issue covers adding data imported from the export bucket (or directory) into the database.

Relevant links

Non-functional requirements

  • Documentation: n/a
  • Feature flag: n/a
  • Performance: n/a
  • Testing: n/a

Proposal

Add db ingestion service which can bulk insert imported data in a manner similar to PackageMetadata::Ingestion::CompressedPackage::IngestionService.

The worker can be broken up into 2 tasks:

  1. Task to upsert advisory data.
  2. Task to upsert affected package data using the id from task 1 as the foreign key.

Ingesting data objects

Data objects are an abstraction shared between database ingestion and json data import to structure imported data for use in instantiating models. The ingestion service will be invoked with a list of AdvisoryDataObjects which will be turned into into PackageMetadata::Advisory and PackageMetadata::AffectedPackage model instances.

The structure of the emitted data objects will be:

module PackageMetadata
  class AdvisoryDataObject
    attr_accessor
      :uuid, :source, :published_date, :title, :description, :cvss_v2, :cvss_v3, :urls, :identifiers,
      :affected_packages
  end
end

affected_packages is a list of PackageMetadata::AffectedPackageDataObject with the following structure:

module PackageMetadata
  class AffectedPackageDataObject
    attr_accessor
      :purl_type, :package_name, :distro_version, :solution, :affected_range, :fixed_versions,
      :pm_advisory_id
  end
end

Transforming PackageMetadata::AdvisoryDataObject to PackageMetadata::Advisory

These PackageMetadata::AdvisoryDataObject fields have a 1-to-1 mapping to the model:

  • title
  • description
  • cvss_v2
  • cvss_v3
  • published_date
  • urls
  • identifiers
  • advisory_xid
  • source_xid

affected_packages is a list of PackageMetadata::AffectedPackageDataObjects affected by this advisory.

See the exporter's data format description for more info.

Transforming AffectedPackageDataObject to PackageMetadata::AffectedPackage

PackageMetadata::AdvisoryDataObject.affected_packages stores a list of data objects of type PackageMetadata::AffectedPackageDataObject which correspond to the advisory. Note this list will only hold 1 package for advisories with source type glad.

These PackageMetadata::AffectedPackageDataObject fields have a 1-to-1 mapping to the model:

  • affected_range
  • solution
  • fixed_versions
  • package_name
  • purl_type

pm_advisory_id is set on each affected package after the package model has been stored in the database and its id is available. It provides the foreign key.

Implementation plan

  • Update the following models to support bulk upsert via BulkInsertSafe (example).
    • PackageMetadata::Advisory
    • PackageMetadata::AffectedPackage
  • Add PackageMetadata::Ingestion::Advisory::IngestionService.
    • #execute is the entrypoint and is called with a list of instances of type PackageMetadata::AdvisoryDataObject.
  • Add PackageMetadata::Ingestion::Advisory::AdvisoryIngestionTask.
    • #execute is the entrypoint and is called with a list of data objects of type PackageMetadata::AdvisoryDataObject.
    • Create a list of PackageMetadata::Advisory instances instantiated from the corresponding data objects.
    • Filter instantiated objects by using json schema validation to only use valid objects. Discard and log the invalid objects (example).
    • Upsert using PackageMetadata::Advisory.bulk_upsert!.
    • For each inserted advisory set PackageMetadata::AffectedPackageDataObject.pm_advisory_id to the id returned from the query. Affected package data objects corresponding to the inserted advisory are under PackageMetadata::AdvisoryDataObject.affected_packages.
  • Add PackageMetadata::Ingestion::Advisory::AffectedPackageIngestionTask
    • #execute is the entrypoint and is called with a list of data objects of type PackageMetadata::AffectedPackageDataObject.
    • Create a list of PackageMetadata::AffectedPackage instances instantiated from the corresponding data objects.
    • Filter instantiated objects by using json schema validation to only use valid objects. Discard and log the invalid objects (example).
    • Upsert using PackageMetadata::AffectedPackage.bulk_upsert!.

Note: CompressedPackage::IngestionService is an example of bulk-upserting using 2 tasks.

Edited by Adam Cohen