Add a Scan DB model to persist status and metadata of security reports

Problem to solve

On several places, we are currently not able to distinguish if we have no report available, or a report with no vulnerabilities. This is because we are storing or fetch only a list of vulnerabilities, so no vulnerabilities cover both cases.

There are features that require deeper knowledge and having a Scan DB model could help.

Note: Previously, it was suggested this new model be called Report. This has been updated to Scan.

Further details

Benefit from having a dedicated record for scans:

We could easily know the last known state to improve vulnerability history chart queries. https://gitlab.com/gitlab-org/gitlab-ee/issues/9526
We could distinguish no report from report without vulnerabilities in the history charts in an easier way. https://gitlab.com/gitlab-org/gitlab-ee/issues/9069
We could also report execution errors in dashboards to warn users.
We could probably avoid the usage ping timeout by replacing the count query on the gigantic ci_job_artifacts table to this dedicated table. https://gitlab.com/gitlab-org/gitlab-ee/issues/7851
We could filter out project having security reports in an easier and more performant way, as it will be a simpler lookup in a smaller table. https://gitlab.com/gitlab-org/gitlab-ee/issues/9074
Show Container Scanning results in the GitLab Container Registry https://gitlab.com/gitlab-org/gitlab-ee/issues/8790
Make it easier to know if a security report is outdated in a Merge Request. https://gitlab.com/gitlab-org/gitlab-ee/issues/4913

Proposal

Previous proposal

store a new Report record in DB after each pipeline run, for each report type.
generated report records will stay forever (until data retention policy is defined)
as raw JSON artifact report may expire, we need to store with this Report record the data we want to persist and keep after the artifact is removed
to create a report record when there is no artifact uploaded (e.g. job failure) we need to identify which jobs in the pipeline were supposed to create a report artifact. This is being discussed here: https://gitlab.com/gitlab-org/gitlab-ee/issues/13662#note_210498199
consider having this record generic enough to support other types of reports (junit, performance, code quality, licenses), not only security ones.
consider migrating old data (parse artifacts from last X days and create corresponding report records)

Store a new Scan record in DB after each build, one for each Secure report type (SAST, DAST etc) exposed as an artifact during the build.
Each Scan record represents a scan that happened in a build. If there are multiple scans of the same type (SAST/DAST etc), then there should be multiple scan records. It is not an aggregated result.
When a build fails to produce the desired Secure report, a scan record will not be created in the database. This can be improved in future.
Generated Scan records will stay forever (until data retention policy is defined).
Scan records will not contain any aggregated information about the result: success, number of vulnerabilities, etc. This can be iterated on and improved later.
Scan records will not contain a reference to the JobArtifact, nor will it contain any information about the Job Artifact itself.
The scan is Secure specific. This should not affect other report types in GitLab (junit, performance, code quality, licenses).
Jobs with Secure report artifacts should be migrated to create Scan records. This is up for discussion if other migration strategies are discovered.
Once the Scan is released, a subsequent migration should occur that cleans up and completes the previous migration. Only after this step will the Scan model be able to be used.

Proposed DB model

Previous proposal

The new Report model should have the following attributes (WIP):

report_type (sast, dast, dependency_scanning, etc.)
pipeline_id
status TBD (e.g. success, failure, missing, etc.)
errors
vulnerabily_counts TBD (total?, per severity?)
scan settings TBD (e.g. env variables)

The new Scan model will have the following attributes:

scan_type (sast, dast, dependency_scanning, etc.)
build_id

Links / references

https://gitlab.slack.com/archives/C8S0HHM44/p1579567350023300

Implementation plan

Create a Security::Scan model
Add a worker that on completion of a Job, populates the Security::Scan for each Job Artifact that is a Security Report
Add an index that will facilitate efficient migration of all of the previous Job Artifacts
Add a migration that will migrate Security Report Job Artifacts to Security::Scan
Release, following migration to make sure it has completed successfully
Add a migration to steal remaining migration jobs, i.e, synchronously wait for it to complete
Run database-lab queries to see how many records exist, and for which Secure scan types
Remove the temporary index created for the first migration

Edited Mar 31, 2020 by Cameron Swords