Add a Scan DB model to persist status and metadata of security reports
Problem to solve
On several places, we are currently not able to distinguish if we have no report available, or a report with no vulnerabilities. This is because we are storing or fetch only a list of vulnerabilities, so no vulnerabilities cover both cases.
There are features that require deeper knowledge and having a Scan
DB model could help.
Note: Previously, it was suggested this new model be called Report
. This has been updated to Scan
.
Further details
Benefit from having a dedicated record for scans:
- We could easily know the last known state to improve vulnerability history chart queries. https://gitlab.com/gitlab-org/gitlab-ee/issues/9526
- We could distinguish no report from report without vulnerabilities in the history charts in an easier way. https://gitlab.com/gitlab-org/gitlab-ee/issues/9069
- We could also report execution errors in dashboards to warn users.
- We could probably avoid the usage ping timeout by replacing the count query on the gigantic
ci_job_artifacts
table to this dedicated table. https://gitlab.com/gitlab-org/gitlab-ee/issues/7851 - We could filter out project having security reports in an easier and more performant way, as it will be a simpler lookup in a smaller table. https://gitlab.com/gitlab-org/gitlab-ee/issues/9074
- Show Container Scanning results in the GitLab Container Registry https://gitlab.com/gitlab-org/gitlab-ee/issues/8790
- Make it easier to know if a security report is outdated in a Merge Request. https://gitlab.com/gitlab-org/gitlab-ee/issues/4913
Proposal
Previous proposal
- store a new
Report
record in DB after each pipeline run, for each report type. - generated
report
records will stay forever (until data retention policy is defined) - as raw JSON artifact report may expire, we need to store with this Report record the data we want to persist and keep after the artifact is removed
- to create a
report
record when there is no artifact uploaded (e.g. job failure) we need to identify which jobs in the pipeline were supposed to create areport
artifact. This is being discussed here: https://gitlab.com/gitlab-org/gitlab-ee/issues/13662#note_210498199 - consider having this record generic enough to support other types of reports (junit, performance, code quality, licenses), not only security ones.
- consider migrating old data (parse artifacts from last X days and create corresponding report records)
- Store a new
Scan
record in DB after each build, one for each Secure report type (SAST, DAST etc) exposed as an artifact during the build. - Each Scan record represents a scan that happened in a build. If there are multiple scans of the same type (SAST/DAST etc), then there should be multiple scan records. It is not an aggregated result.
- When a build fails to produce the desired Secure report, a scan record will not be created in the database. This can be improved in future.
- Generated Scan records will stay forever (until data retention policy is defined).
- Scan records will not contain any aggregated information about the result: success, number of vulnerabilities, etc. This can be iterated on and improved later.
- Scan records will not contain a reference to the
JobArtifact
, nor will it contain any information about the Job Artifact itself. - The scan is Secure specific. This should not affect other report types in GitLab (junit, performance, code quality, licenses).
- Jobs with Secure report artifacts should be migrated to create Scan records. This is up for discussion if other migration strategies are discovered.
- Once the Scan is released, a subsequent migration should occur that cleans up and completes the previous migration. Only after this step will the
Scan
model be able to be used.
Proposed DB model
Previous proposal
The new Report model should have the following attributes (WIP):-
report_type
(sast, dast, dependency_scanning, etc.) pipeline_id
-
status
TBD (e.g. success, failure, missing, etc.) errors
-
vulnerabily_counts
TBD (total?, per severity?) -
scan settings
TBD (e.g. env variables)
The new Scan
model will have the following attributes:
-
scan_type
(sast, dast, dependency_scanning, etc.) build_id
Links / references
https://gitlab.slack.com/archives/C8S0HHM44/p1579567350023300
Implementation plan
-
Create a Security::Scan
model -
Add a worker that on completion of a Job, populates the Security::Scan
for each Job Artifact that is a Security Report -
Add an index that will facilitate efficient migration of all of the previous Job Artifacts -
Add a migration that will migrate Security Report Job Artifacts to Security::Scan
-
Release, following migration to make sure it has completed successfully -
Add a migration to steal
remaining migration jobs, i.e, synchronously wait for it to complete -
Run database-lab queries to see how many records exist, and for which Secure scan types -
Remove the temporary index created for the first migration
Edited by Cameron Swords