Extend advisory-feeder to ingest all trivy-db data (without a cursor)

Goal

The goal is to extend the advisory-feeder with a new source of data. Trivy-db is the new source of data containing advisories about OS packages. Even if trivy-db is the real source, in reality we get our data from trivy-db-glad.

Relevant links

  • #422869 (comment 1522612583)
  • Extend advisory-feeder to ingest latest changes... (#423391 - closed)
  • Gitlab container registry API
  • Trivy-db advisory structure fields

Introduction

In order to avoid creating a big weight issue I will divide the implementation in two parts:

  1. This issue will be about implementing the ingestion of all trivy db packages and vulnerabilities. This means that we won't be comparing with the last processed trivy.db file.
  2. The second issue will be about extending the advisory-feeder to check the cursor and find the diff between the last trivy.db and the last processed trivy.db file.

Requirements

  • The user needs to specify in the advisory-feeder command what is the source of advisory data. This can be done through a flag --source=glad or --source=trivy-db
  • For this issue --internal-bucket that specifies the bucket that contains the cursor will not be used.
  • The user needs to specify the topic to be used. For trivy-db as a source we will introduce two new flags --send-topic-advisories and --send-topic-os-pkg.
  • We do not need to ingest application packages (aka package manager packages). So we can ignore all trivy.db buckets that contain:: in their name.
  • In this first iteration we won't ingest Red Hat packages. Consequently we also ignore Red Hat CPE bucket.

Implementation plan

  • Introduce the new --source flag.
  • Introduce the required pubsub topic flags for trivy-db.
  • Rename --send-topic flag to --glad-topic
  • Modify main so that we can differentiate between the two sources.
  • Introduce a new package feeders/advisory/trivy-db. This package is responsible for using the Gitlab Registry API to download the latest image, unzip it, read all the data in memory and sending them to the correct topic. We can use the bbolt library to read the trivy.db file.
  • Introduce unit test wherever possible.
  • Verify that we publish all the information.
  • Release advisory feeder with trivy-db capabilities
  • Update run_feeder.sh according to the new cli flags.
  • Update the CI job to run the feeder.
  • Update the advisory feeder binary version.
  • Update scheduled advisory feeder jobs so that they introduce the env var ADV_FEEDER_SOURCE=glad. Both on dev and prod.
  • Update Feeder testing instructions on a need basis.
  • Update Creating scheduled pipelines for license-feeder with the new flags that have been added.
Edited Oct 12, 2023 by Nick Ilieskou
Assignee Loading
Time tracking Loading