Ingest advisory and affected package data into the database
Why are we doing this work
The work in this issue covers adding data imported from the export bucket (or directory) into the database.
Relevant links
- license package metadata ingestion:
- issue for the sync service from which ingestion will be invoked: Add worker to trigger package metadata advisory... (#370780 - closed)
Non-functional requirements
- Documentation: n/a
- Feature flag: n/a
- Performance: n/a
- Testing: n/a
Proposal
Add db ingestion service which can bulk insert imported data in a manner similar to PackageMetadata::Ingestion::CompressedPackage::IngestionService.
The worker can be broken up into 2 tasks:
- Task to upsert advisory data.
- Task to upsert affected package data using the id from task 1 as the foreign key.
Ingesting data objects
Data objects are an abstraction shared between database ingestion and json data import to structure imported data for use in instantiating models. The ingestion service will be invoked with a list of AdvisoryDataObjects which will be turned into into PackageMetadata::Advisory and PackageMetadata::AffectedPackage model instances.
The structure of the emitted data objects will be:
module PackageMetadata
class AdvisoryDataObject
attr_accessor
:uuid, :source, :published_date, :title, :description, :cvss_v2, :cvss_v3, :urls, :identifiers,
:affected_packages
end
end
affected_packages is a list of PackageMetadata::AffectedPackageDataObject with the following structure:
module PackageMetadata
class AffectedPackageDataObject
attr_accessor
:purl_type, :package_name, :distro_version, :solution, :affected_range, :fixed_versions,
:pm_advisory_id
end
end
Transforming PackageMetadata::AdvisoryDataObject to PackageMetadata::Advisory
These PackageMetadata::AdvisoryDataObject fields have a 1-to-1 mapping to the model:
titledescriptioncvss_v2cvss_v3published_dateurlsidentifiersadvisory_xidsource_xid
affected_packages is a list of PackageMetadata::AffectedPackageDataObjects affected by this advisory.
See the exporter's data format description for more info.
Transforming AffectedPackageDataObject to PackageMetadata::AffectedPackage
PackageMetadata::AdvisoryDataObject.affected_packages stores a list of data objects of type PackageMetadata::AffectedPackageDataObject which correspond to the advisory. Note this list will only hold 1 package for advisories with source type glad.
These PackageMetadata::AffectedPackageDataObject fields have a 1-to-1 mapping to the model:
affected_rangesolutionfixed_versionspackage_namepurl_type
pm_advisory_id is set on each affected package after the package model has been stored in the database and its id is available. It provides the foreign key.
Implementation plan
-
Update the following models to support bulk upsert via BulkInsertSafe(example).PackageMetadata::AdvisoryPackageMetadata::AffectedPackage
-
Add PackageMetadata::Ingestion::Advisory::IngestionService.-
#executeis the entrypoint and is called with a list of instances of typePackageMetadata::AdvisoryDataObject.
-
-
Add PackageMetadata::Ingestion::Advisory::AdvisoryIngestionTask.-
#executeis the entrypoint and is called with a list of data objects of typePackageMetadata::AdvisoryDataObject. - Create a list of
PackageMetadata::Advisoryinstances instantiated from the corresponding data objects. - Filter instantiated objects by using json schema validation to only use valid objects. Discard and log the invalid objects (example).
- Upsert using
PackageMetadata::Advisory.bulk_upsert!. - For each inserted advisory set
PackageMetadata::AffectedPackageDataObject.pm_advisory_idto theidreturned from the query. Affected package data objects corresponding to the inserted advisory are underPackageMetadata::AdvisoryDataObject.affected_packages.
-
-
Add PackageMetadata::Ingestion::Advisory::AffectedPackageIngestionTask-
#executeis the entrypoint and is called with a list of data objects of typePackageMetadata::AffectedPackageDataObject. - Create a list of
PackageMetadata::AffectedPackageinstances instantiated from the corresponding data objects. - Filter instantiated objects by using json schema validation to only use valid objects. Discard and log the invalid objects (example).
- Upsert using
PackageMetadata::AffectedPackage.bulk_upsert!.
-
Note: CompressedPackage::IngestionService is an example of bulk-upserting using 2 tasks.