Skip to content

Cleanup old vulnerabilities data stored in DB

Problem to solve

The way we store vulnerabilities in DB implies some data could become obsolete over time and we must clean things up to avoid filling DB indefinitely.

So we need to define the retention period (like artifacts expiration date) and trigger a clean up job to delete old records.

This has to be carefully design by keeping in mind that we want historical metrics for vulnerabilities.

Further details

The usual lifecycle itself requires a cleanup policy but there are also some specific cases that could lead to stale data we want to get rid of:

  • if a report type has been removed from the config, there won't be such report anymore so existing records for that category never get cleaned up.
  • if the default branch changes, vulnerabilities stored on the previous one will stay forever
  • once we decide to support other branches, old refs will also stay forever even if the branch is removed/not active anymore.

Proposal

Run a periodic background job to clean up old data. The cleanup job should delete:

  • vulnerability_occurrences records that only belong to pipelines older than the retention period (join with vulnerability_occurrence_pipelines join model). This will also automatically delete the matching vulnerability_occurrence_pipelines and vulnerability_occurrence_identifiers records due to FKs on_delete: :cascade option.
  • vulnerability_identifiers records that aren't used anymore (without matching vulnerability_occurrence_identifiers join model records)
  • vulnerability_scanners records that aren't used anymore (no vulnerability_occurrences record with matching scanner_id)

This is a first start that covers all cases after a given amount of time. We can improve it to react immediately to some specific edges cases if necessary (branch removed, report disabled, etc.).

What does success look like, and how can we measure that?

(If no way to measure success, link to an issue that will implement a way to measure this)

Links / references

Edited by 🤖 GitLab Bot 🤖