Follow up on vulnerabilities DB model
Problem to solve
Vulnerabilities are now stored in database (SAST only yet, but other report types are being added).
As it's not possible to correctly estimate how these tables will grow over time and because the way we are using them is still being designed, it was hard to optimize right from the beginning. After few releases we should have enough records to address these topics:
- how quickly the DB is being filled up (and size it takes)
- which queries are we doing
- which indexes we can add to help speeding things up (if necessary)
Further details
Tables size/ nb of records:
-
vulnerability_occurrences: TODO -
vulnerability_occurrence_pipelines: TODO -
vulnerability_identifiers: TODO -
vulnerability_occurrence_identifiers: TODO -
vulnerability_scanners: TODO -
vulnerability_feedback: TODO
Raw counts available at: https://gitlab.com/gitlab-com/gl-infra/infrastructure/snippets/1807354
Queries
These tables are only queried from the Group level Security Dashboard for now:
- list vulnerabilities found in the last pipeline of each project of a group, for the default branch. Optionally filter by
report_type, byseverity, byproject. - count vulnerabilities found in the last pipeline of each project of a group, for the default branch, grouped by
report_typeandseverity. Optionally filter byreport_type, byseverity, byproject. - count vulnerabilities of each project of a group, for the default branch, grouped by
dayandseverity. Optionally filter byreport_type, byseverity, byproject.
Proposal
-
Get info on table size/records counts on production DB -
Think/discuss -
Add relevant indexes
What does success look like, and how can we measure that?
(If no way to measure success, link to an issue that will implement a way to measure this)
Links / references
Original discussion about indexes: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/7757#note_106224252
Edited by 🤖 GitLab Bot 🤖