Follow up on vulnerabilities DB model

Problem to solve

Vulnerabilities are now stored in database (SAST only yet, but other report types are being added).

As it's not possible to correctly estimate how these tables will grow over time and because the way we are using them is still being designed, it was hard to optimize right from the beginning. After few releases we should have enough records to address these topics:

how quickly the DB is being filled up (and size it takes)
which queries are we doing
which indexes we can add to help speeding things up (if necessary)

Further details

Tables size/ nb of records:

vulnerability_occurrences: TODO
vulnerability_occurrence_pipelines: TODO
vulnerability_identifiers: TODO
vulnerability_occurrence_identifiers: TODO
vulnerability_scanners: TODO
vulnerability_feedback: TODO

Raw counts available at: https://gitlab.com/gitlab-com/gl-infra/infrastructure/snippets/1807354

Queries

These tables are only queried from the Group level Security Dashboard for now:

list vulnerabilities found in the last pipeline of each project of a group, for the default branch. Optionally filter by report_type, by severity, by project.
count vulnerabilities found in the last pipeline of each project of a group, for the default branch, grouped by report_type and severity. Optionally filter by report_type, by severity, by project.
count vulnerabilities of each project of a group, for the default branch, grouped by day and severity. Optionally filter by report_type, by severity, by project.

Proposal

Get info on table size/records counts on production DB
Think/discuss
Add relevant indexes

What does success look like, and how can we measure that?

(If no way to measure success, link to an issue that will implement a way to measure this)

Links / references

Original discussion about indexes: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/7757#note_106224252

Edited Aug 14, 2020 by 🤖 GitLab Bot 🤖