Add DB Schema changes for ingesting OS package advisories

Goal

Prepare the license-db to ingest OS package advisories. The source of the advisories is the trivy-db-glad.

Related links

Deliverable

  • Add the required migration scripts to the schema project
  • Deploy the migration scripts on dev and prod

Requirements

  • Store raw data in a single jsonb column.
  • Extract values needed to support UPSERTs, and store them in separate DB columns. These values are NOT removed from the raw JSON document. This is for indexing only.
  • Process the raw data during the export.

Pros & cons

  • The ingestion is simple.
  • The export can evolve and leverage more fields w/o re-ingesting anything. For instance, in the future we could export YAML fields of the GLAD advisories we haven't leveraged so far. We wouldn't need to re-ingest any YAML file.
  • This makes the export complex, relative to the ingestion.

Relates to

#422869 (comment 1522730005)\+

Implementation Plan

We need to introduce the following tables

Table trivy_db_advisory
id BIGSERIAL (PK)
cve_id text
last_updated timestamp
advisory jsonb
Table trivy_db_package
id BIGSERIAL (PK)
distro_name string
distro_version string
package_name string
advisory_id BIGSERIAL (FK)
node_value jsonb
last_updated timestamp

trivy_db_advisory

This table contains all the advisories defined by the vulnerability bucket in the trivy-db.

  • id: is a PK. Just a sequential big int
  • cve_id: It will contain the vulnerability identifier as a string
  • last_updated: a timestamp that will be updated on any modification
  • advisory: a jsonb column that contains the vulnerability from the trivy-db as is. Without any modifications.

trivy_db_package

This table contains mapping of trivy-db vulnerabilities to OS packages. It also contains information about fixed versions.

  • id: A sequential big int PK
  • distro_name: the name of the distro. For example alpine
  • distro_version: the version of the distro. For example 3.1.
  • package_name: the name of the package. For example curl.
  • advisory_id: the id of the advisory. This field is a FK.
  • node_value: This value contains a json object containing one or more fields of the trivy-db Advisory struct. Usually OS packages have the FixedVersion field which is what we intent to support in the first iteration.

Unique constrain: the combination of distro_name, distro_version, package_name and advisory_id.

Tasks

  • Introduce a migration script that adds these tables in the schema project.
  • Deploy on dev
  • Deploy on prod
Edited by Nick Ilieskou