Migration to create vulnerabilities
What does this MR do?
Adds a background migration to create vulnerabilities records from vulnerability_occurrences records (vulnerability findings) for projects that do not currently have any vulnerabilities records. It also updates the state of vulnerabilities records for all projects to be dismissed where the corresponding vulnerability_occurrences record has been dismissed.
There are roughly 5,561 projects that need to be migrated. With a two minute delay between scheduling them, that comes up to roughly 7.7 days to run.
All of the below queries were ran against our #database-lab slack channel.
1. Create index on vulnerability_occurrence_pipelines for the inserts
exec create index idx_vuln_occurrence_pipelines_on_pipeline_id_and_occurrence_id on vulnerability_occurrence_pipelines USING btree(pipeline_id, occurrence_id)
The query has been executed. Duration: 1.941 min
2. Add temporary vulnerability_occurrence_id column
exec alter table vulnerabilities add column vulnerability_occurrence_id bigint
The query has been executed. Duration: 15.000 ms
3. Add temporary index on vulnerabilities for update
exec create index index_vulnerabilities_for_occurrence_id_update on vulnerabilities USING btree(vulnerability_occurrence_id)
The query has been executed. Duration: 2.771 s
4. Add temporary index on vulnerability_occurrences for update
exec create index index_vulnerability_occurrences_for_occurrence_id_update on vulnerability_occurrences USING btree(id, project_id)
The query has been executed. Duration: 23.065 s
code
5. Find all projects that have vulnerability_occurrences recordshttps://gitlab.slack.com/archives/CLJMDRD8C/p1586454408382200 (GitLab internal)
https://explain.dalibo.com/plan/TGg
Time: 1.371 s
explain SELECT “vulnerability_occurrences”.“project_id” FROM “vulnerability_occurrences” GROUP BY “vulnerability_occurrences”.“project_id”
6. Determine if project already has vulnerabilities migrated
https://gitlab.slack.com/archives/CLJMDRD8C/p1586727939051200
https://explain.dalibo.com/plan/la
Time: 4.770 ms
explain SELECT 1 FROM vulnerabilities WHERE vulnerabilities.project_id = 14492286
7. Find the first vulnerability_occurrence for a project
https://gitlab.slack.com/archives/CLJMDRD8C/p1586456164386300 (GitLab internal)
https://explain.dalibo.com/plan/43Y
Time: 767.045 ms
explain
SELECT "vulnerability_occurrence_pipelines"."occurrence_id" FROM "vulnerability_occurrence_pipelines" WHERE ( vulnerability_occurrence_pipelines.pipeline_id = (
SELECT MAX(id) AS max_id
FROM ci_pipelines
WHERE project_id = 14492286
AND status='success'
AND EXISTS (
SELECT 1 FROM vulnerability_occurrence_pipelines
WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id
)
)
) ORDER BY "vulnerability_occurrence_pipelines"."occurrence_id" ASC LIMIT 1
8. Find the first 10,0000 vulnerability_occurrences for a project
https://gitlab.slack.com/archives/CLJMDRD8C/p1586456239388700 (GitLab internal)
https://explain.dalibo.com/plan/kkc
Time: 17.923 ms
explain
SELECT "vulnerability_occurrence_pipelines"."occurrence_id" FROM "vulnerability_occurrence_pipelines" WHERE ( vulnerability_occurrence_pipelines.pipeline_id = (
SELECT MAX(id) AS max_id
FROM ci_pipelines
WHERE project_id = 14492286
AND status='success'
AND EXISTS (
SELECT 1 FROM vulnerability_occurrence_pipelines
WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id
)
)
) AND "vulnerability_occurrence_pipelines"."occurrence_id" >= 1273939 ORDER BY "vulnerability_occurrence_pipelines"."occurrence_id" ASC LIMIT 1 OFFSET 10000
9. Find the min and max of the first 10,0000 vulnerability_occurrences for a project
https://gitlab.slack.com/archives/CLJMDRD8C/p1586456337391100 (GitLab internal)
https://explain.dalibo.com/plan/6EC
Time: 6.571 ms
explain
SELECT MIN(occurrence_id), MAX(occurrence_id) FROM "vulnerability_occurrence_pipelines" WHERE ( vulnerability_occurrence_pipelines.pipeline_id = (
SELECT MAX(id) AS max_id
FROM ci_pipelines
WHERE project_id = 14492286
AND status='success'
AND EXISTS (
SELECT 1 FROM vulnerability_occurrence_pipelines
WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id
)
)
) AND "vulnerability_occurrence_pipelines"."occurrence_id" >= 1273939 AND "vulnerability_occurrence_pipelines"."occurrence_id" < 1283949
10. Insert vulnerabilities records for the first 10,0000 vulnerability_occurrences for a project
https://gitlab.slack.com/archives/CLJMDRD8C/p1586456595395200 (GitLab internal)
https://explain.dalibo.com/plan/aQJ
Time: 2.038 s
explain
WITH last_pipeline AS (
SELECT MAX(id) AS id FROM ci_pipelines WHERE project_id = 14492286 AND status='success' AND EXISTS (SELECT 1 FROM vulnerability_occurrence_pipelines WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id)
)
INSERT INTO vulnerabilities(project_id, author_id, created_at, updated_at, title, title_html, state, severity, confidence, report_type, vulnerability_occurrence_id)
SELECT 14492286, ci_pipelines.user_id, vulnerability_occurrences.created_at, vulnerability_occurrences.updated_at, left(vulnerability_occurrences.name, 255), vulnerability_occurrences.name, 1, vulnerability_occurrences.severity, vulnerability_occurrences.confidence, vulnerability_occurrences.report_type, "vulnerability_occurrences"."id"
FROM vulnerability_occurrence_pipelines
JOIN last_pipeline ON "last_pipeline"."id" = vulnerability_occurrence_pipelines.pipeline_id
JOIN vulnerability_occurrences on vulnerability_occurrence_pipelines.occurrence_id = "vulnerability_occurrences"."id"
JOIN ci_pipelines ON vulnerability_occurrence_pipelines.pipeline_id = "ci_pipelines"."id"
WHERE "vulnerability_occurrences"."id" >= 1273939 AND "vulnerability_occurrences"."id" <= 1283950
11. Update the vulnerability_id of the vulnerability_occurrences records that were just created
https://gitlab.slack.com/archives/CLJMDRD8C/p1586457176421800 (GitLab internal)
https://explain.dalibo.com/plan/Pyv
Time: 1.867 s
explain
update vulnerability_occurrences set vulnerability_id = "vulnerabilities"."id"
from vulnerabilities
where vulnerabilities.vulnerability_occurrence_id = "vulnerability_occurrences"."id"
and vulnerability_occurrences.project_id = 14492286
and "vulnerability_occurrences"."id" >= 1273939 AND "vulnerability_occurrences"."id" <= 1283950;
12. Set the state of vulnerabilities to dismissed for a project
https://gitlab.slack.com/archives/CLJMDRD8C/p1586457259428200 (GitLab internal)
https://explain.dalibo.com/plan/l0
Time: 1.155 ms
explain
UPDATE vulnerabilities
SET state = 2
FROM vulnerability_occurrences
WHERE vulnerability_occurrences.vulnerability_id = "vulnerabilities"."id"
AND vulnerabilities.state = 1
AND (
EXISTS (
SELECT 1
FROM vulnerability_feedback
WHERE vulnerability_occurrences.report_type = vulnerability_feedback.category
AND vulnerability_occurrences.project_id = vulnerability_feedback.project_id
AND ENCODE(vulnerability_occurrences.project_fingerprint, 'HEX') = vulnerability_feedback.project_fingerprint
AND vulnerability_feedback.feedback_type = 0
)
)
AND vulnerability_occurrences.project_id = 14492286;
More details about the number of records we will need to create
This query below was ran against production to find the top 10 projects (based on number of vulnerability_occurrences , using this as a rough guide to which ones will take the longest) that need to have vulnerabilities created.
select vulnerability_occurrences.project_id, count(*) as count from vulnerability_occurrences
where vulnerability_occurrences.project_id in
(
select vulnerability_occurrences.project_id from vulnerability_occurrences
where ( not exists ( select NULL from vulnerabilities where (vulnerabilities.project_id = vulnerability_occurrences.project_id)))
group by vulnerability_occurrences.project_id
)
and ( not exists ( select NULL from vulnerabilities where (vulnerabilities.project_id = vulnerability_occurrences.project_id)))
group by vulnerability_occurrences.project_id
order by count desc
limit 10
https://gitlab.slack.com/archives/C3NBYFJ6N/p1583771537149700 (GitLab internal)
And here are the plans for running against a sampling of those 10 projects (project ids redacted)
vulnerability_occurrences per project | number of vulnerabilities to create (linked to query plan) |
---|---|
97440 | 96480 |
46531 | 46531 |
45897 | 23 |
21493 | 21431 |
20007 | 19285 |
16347 | 0 |
15381 | 0 |
14713 | 0 |
11470 | 11470 |
11049 | 150 |
Total: 300,328 |