Skip to content

Migration to create vulnerabilities

rossfuhrman requested to merge rf-vulnerabilities-migration into master

What does this MR do?

Adds a background migration to create vulnerabilities records from vulnerability_occurrences records (vulnerability findings) for projects that do not currently have any vulnerabilities records. It also updates the state of vulnerabilities records for all projects to be dismissed where the corresponding vulnerability_occurrences record has been dismissed.

There are roughly 5,561 projects that need to be migrated. With a two minute delay between scheduling them, that comes up to roughly 7.7 days to run.

All of the below queries were ran against our #database-lab slack channel.

1. Create index on vulnerability_occurrence_pipelines for the inserts

exec create index idx_vuln_occurrence_pipelines_on_pipeline_id_and_occurrence_id on vulnerability_occurrence_pipelines USING btree(pipeline_id, occurrence_id)

The query has been executed. Duration: 1.941 min

2. Add temporary vulnerability_occurrence_id column

exec alter table vulnerabilities add column vulnerability_occurrence_id bigint

The query has been executed. Duration: 15.000 ms

3. Add temporary index on vulnerabilities for update

exec create index index_vulnerabilities_for_occurrence_id_update on vulnerabilities USING btree(vulnerability_occurrence_id)

The query has been executed. Duration: 2.771 s

4. Add temporary index on vulnerability_occurrences for update

exec create index index_vulnerability_occurrences_for_occurrence_id_update on vulnerability_occurrences USING btree(id, project_id)

The query has been executed. Duration: 23.065 s

5. Find all projects that have vulnerability_occurrences records code

https://gitlab.slack.com/archives/CLJMDRD8C/p1586454408382200 (GitLab internal)

https://explain.dalibo.com/plan/TGg

Time: 1.371 s

explain SELECT vulnerability_occurrences.project_id FROM vulnerability_occurrences GROUP BY vulnerability_occurrences.project_id

6. Determine if project already has vulnerabilities migrated

https://gitlab.slack.com/archives/CLJMDRD8C/p1586727939051200

https://explain.dalibo.com/plan/la

Time: 4.770 ms

explain SELECT 1 FROM vulnerabilities WHERE vulnerabilities.project_id = 14492286

7. Find the first vulnerability_occurrence for a project

https://gitlab.slack.com/archives/CLJMDRD8C/p1586456164386300 (GitLab internal)

https://explain.dalibo.com/plan/43Y

Time: 767.045 ms

explain
SELECT "vulnerability_occurrence_pipelines"."occurrence_id" FROM "vulnerability_occurrence_pipelines" WHERE (              vulnerability_occurrence_pipelines.pipeline_id = (
                SELECT MAX(id) AS max_id
                FROM ci_pipelines
                WHERE project_id = 14492286
                AND status='success'
                AND EXISTS (
                  SELECT 1 FROM vulnerability_occurrence_pipelines
                  WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id
                )
              )
) ORDER BY "vulnerability_occurrence_pipelines"."occurrence_id" ASC LIMIT 1

8. Find the first 10,0000 vulnerability_occurrences for a project

https://gitlab.slack.com/archives/CLJMDRD8C/p1586456239388700 (GitLab internal)

https://explain.dalibo.com/plan/kkc

Time: 17.923 ms

explain
SELECT "vulnerability_occurrence_pipelines"."occurrence_id" FROM "vulnerability_occurrence_pipelines" WHERE (              vulnerability_occurrence_pipelines.pipeline_id = (
                SELECT MAX(id) AS max_id
                FROM ci_pipelines
                WHERE project_id = 14492286
                AND status='success'
                AND EXISTS (
                  SELECT 1 FROM vulnerability_occurrence_pipelines
                  WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id
                )
              )
) AND "vulnerability_occurrence_pipelines"."occurrence_id" >= 1273939 ORDER BY "vulnerability_occurrence_pipelines"."occurrence_id" ASC LIMIT 1 OFFSET 10000

9. Find the min and max of the first 10,0000 vulnerability_occurrences for a project

https://gitlab.slack.com/archives/CLJMDRD8C/p1586456337391100 (GitLab internal)

https://explain.dalibo.com/plan/6EC

Time: 6.571 ms

explain
SELECT MIN(occurrence_id), MAX(occurrence_id) FROM "vulnerability_occurrence_pipelines" WHERE (              vulnerability_occurrence_pipelines.pipeline_id = (
                SELECT MAX(id) AS max_id
                FROM ci_pipelines
                WHERE project_id = 14492286
                AND status='success'
                AND EXISTS (
                  SELECT 1 FROM vulnerability_occurrence_pipelines
                  WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id
                )
              )
) AND "vulnerability_occurrence_pipelines"."occurrence_id" >= 1273939 AND "vulnerability_occurrence_pipelines"."occurrence_id" < 1283949

10. Insert vulnerabilities records for the first 10,0000 vulnerability_occurrences for a project

https://gitlab.slack.com/archives/CLJMDRD8C/p1586456595395200 (GitLab internal)

https://explain.dalibo.com/plan/aQJ

Time: 2.038 s

explain
WITH last_pipeline AS (
              SELECT MAX(id) AS id FROM ci_pipelines WHERE project_id = 14492286 AND status='success' AND EXISTS (SELECT 1 FROM vulnerability_occurrence_pipelines WHERE "ci_pipelines"."id" = vulnerability_occurrence_pipelines.pipeline_id)
            )
            INSERT INTO vulnerabilities(project_id, author_id, created_at, updated_at, title, title_html, state, severity, confidence, report_type, vulnerability_occurrence_id)
            SELECT 14492286, ci_pipelines.user_id, vulnerability_occurrences.created_at, vulnerability_occurrences.updated_at, left(vulnerability_occurrences.name, 255), vulnerability_occurrences.name, 1, vulnerability_occurrences.severity, vulnerability_occurrences.confidence, vulnerability_occurrences.report_type, "vulnerability_occurrences"."id"
            FROM vulnerability_occurrence_pipelines
            JOIN last_pipeline ON "last_pipeline"."id" = vulnerability_occurrence_pipelines.pipeline_id
            JOIN  vulnerability_occurrences on vulnerability_occurrence_pipelines.occurrence_id = "vulnerability_occurrences"."id"
            JOIN ci_pipelines ON vulnerability_occurrence_pipelines.pipeline_id = "ci_pipelines"."id"
            WHERE "vulnerability_occurrences"."id" >= 1273939 AND "vulnerability_occurrences"."id" <= 1283950

11. Update the vulnerability_id of the vulnerability_occurrences records that were just created

https://gitlab.slack.com/archives/CLJMDRD8C/p1586457176421800 (GitLab internal)

https://explain.dalibo.com/plan/Pyv

Time: 1.867 s

explain
update vulnerability_occurrences set vulnerability_id = "vulnerabilities"."id"
           from vulnerabilities
           where vulnerabilities.vulnerability_occurrence_id = "vulnerability_occurrences"."id"
           and vulnerability_occurrences.project_id = 14492286
           and "vulnerability_occurrences"."id" >= 1273939 AND "vulnerability_occurrences"."id" <= 1283950;

12. Set the state of vulnerabilities to dismissed for a project

https://gitlab.slack.com/archives/CLJMDRD8C/p1586457259428200 (GitLab internal)

https://explain.dalibo.com/plan/l0

Time: 1.155 ms

explain
UPDATE vulnerabilities
            SET state = 2
            FROM vulnerability_occurrences
            WHERE vulnerability_occurrences.vulnerability_id = "vulnerabilities"."id"
              AND vulnerabilities.state = 1
              AND (
                EXISTS (
                  SELECT 1
                  FROM vulnerability_feedback
                  WHERE vulnerability_occurrences.report_type = vulnerability_feedback.category
                    AND vulnerability_occurrences.project_id = vulnerability_feedback.project_id
                    AND ENCODE(vulnerability_occurrences.project_fingerprint, 'HEX') = vulnerability_feedback.project_fingerprint
                    AND vulnerability_feedback.feedback_type = 0
                )
              )
              AND vulnerability_occurrences.project_id = 14492286;

More details about the number of records we will need to create

This query below was ran against production to find the top 10 projects (based on number of vulnerability_occurrences , using this as a rough guide to which ones will take the longest) that need to have vulnerabilities created.

select vulnerability_occurrences.project_id, count(*) as count from vulnerability_occurrences
where vulnerability_occurrences.project_id in
(
  select vulnerability_occurrences.project_id from vulnerability_occurrences
  where ( not exists ( select NULL from vulnerabilities where (vulnerabilities.project_id = vulnerability_occurrences.project_id)))
  group by vulnerability_occurrences.project_id
)
and ( not exists ( select NULL from vulnerabilities where (vulnerabilities.project_id = vulnerability_occurrences.project_id)))
group by vulnerability_occurrences.project_id
order by count desc
limit 10

https://gitlab.slack.com/archives/C3NBYFJ6N/p1583771537149700 (GitLab internal)

And here are the plans for running against a sampling of those 10 projects (project ids redacted)

vulnerability_occurrences per project number of vulnerabilities to create (linked to query plan)
97440 96480
46531 46531
45897 23
21493 21431
20007 19285
16347 0
15381 0
14713 0
11470 11470
11049 150
Total: 300,328

Issue

#207447 (closed)

Does this MR meet the acceptance criteria?

Conformity

Edited by rossfuhrman

Merge request reports