Populate `resolved_on_default_branch` column for existing vulnerabilities (!40755) · Merge requests · GitLab.org / GitLab

Mehmet Emin INAC requested to merge 227114_populate_resolved_on_default_branch_column_for_existing_vulnerabilities_second_try into master Aug 28, 2020

What does this MR do?

This MR introduces a post-migration task to schedule a data migration job which will populate the resolved_on_default_branch column of existing vulnerabilities.

There was a previous attempt to run this migration with !38795 (merged) which has been reverted from upstream/master because of the errors occurred in the staging environment. Those errors are covered in this MR and the migration version has been changed.

Errors happened in the first attempt;

Migraration up

== 20200826220745 AddCompoundIndexOnVulnerabilitiesForBackgroundMigration: migrating
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:vulnerabilities, [:project_id, :id], {:name=>"index_vulnerabilities_on_project_id_and_id", :algorithm=>:concurrently})
   -> 0.0059s
-- add_index(:vulnerabilities, [:project_id, :id], {:name=>"index_vulnerabilities_on_project_id_and_id", :algorithm=>:concurrently})
   -> 0.0031s
== 20200826220745 AddCompoundIndexOnVulnerabilitiesForBackgroundMigration: migrated (0.0093s)

== 20200826220746 SchedulePopulateResolvedOnDefaultBranchColumn: migrating ====
-- table_exists?(:projects)
   -> 0.0007s
-- table_exists?(:vulnerabilities)
   -> 0.0005s
== 20200826220746 SchedulePopulateResolvedOnDefaultBranchColumn: migrated (0.1239s)

Migraration down

== 20200826220745 AddCompoundIndexOnVulnerabilitiesForBackgroundMigration: reverting
-- transaction_open?()
   -> 0.0000s
-- indexes(:vulnerabilities)
   -> 0.0070s
-- remove_index(:vulnerabilities, {:algorithm=>:concurrently, :name=>"index_vulnerabilities_on_project_id_and_id"})
   -> 0.0020s
== 20200826220745 AddCompoundIndexOnVulnerabilitiesForBackgroundMigration: reverted (0.0094s)

== 20200826220746 SchedulePopulateResolvedOnDefaultBranchColumn: reverting ====
== 20200826220746 SchedulePopulateResolvedOnDefaultBranchColumn: reverted (0.0000s)

Database Queries

Scheduler post-migration task

The scheduler post-migration task schedules a data migration task for every 100 projects by using the following query multiple times;

SELECT DISTINCT
    "vulnerabilities"."project_id"
FROM
    "vulnerabilities"
WHERE
    "vulnerabilities"."project_id" >= 1
ORDER BY
    "vulnerabilities"."project_id" ASC
LIMIT 1 OFFSET 100

 Limit  (cost=1480.47..1495.27 rows=1 width=8) (actual time=957.839..957.839 rows=1 loops=1)
   Buffers: shared hit=637 read=971
   I/O Timings: read=917.190
   ->  Unique  (cost=0.43..79123.27 rows=5346 width=8) (actual time=0.056..957.820 rows=101 loops=1)
         Buffers: shared hit=637 read=971
         I/O Timings: read=917.190
         ->  Index Only Scan using index_vulnerabilities_on_project_id on public.vulnerabilities  (cost=0.43..72160.16 rows=2785242 width=8) (actual time=0.054..942.893 rows=134338 loops=1)
               Index Cond: (vulnerabilities.project_id >= 1)
               Heap Fetches: 0
               Buffers: shared hit=637 read=971
               I/O Timings: read=917.190

Background task

Background task takes project_ids argument as an array of integers and runs the migration for each of them with the following queries;

Load project

SELECT
    "projects".*
FROM
    "projects"
WHERE
    "projects"."id" = $project_id
LIMIT 1

 Limit  (cost=0.43..3.45 rows=1 width=723) (actual time=11.330..11.330 rows=1 loops=1)
   Buffers: shared read=4
   I/O Timings: read=11.218
   ->  Index Scan using projects_pkey on public.projects  (cost=0.43..3.45 rows=1 width=723) (actual time=11.327..11.327 rows=1 loops=1)
         Index Cond: (projects.id = 278964)
         Buffers: shared read=4
         I/O Timings: read=11.218

Load Route

This task also loads the route entry for each project as it has some crucial information for project entity like so;

SELECT
    "routes".*
FROM
    "routes"
WHERE
    "routes"."source_id" = $project_id
    AND "routes"."source_type" = 'Project'
LIMIT 1

 Limit  (cost=0.56..3.58 rows=1 width=80) (actual time=13.741..13.742 rows=1 loops=1)
   Buffers: shared read=5
   I/O Timings: read=13.681
   ->  Index Scan using index_routes_on_source_type_and_source_id on public.routes  (cost=0.56..3.58 rows=1 width=80) (actual time=13.740..13.740 rows=1 loops=1)
         Index Cond: (((routes.source_type)::text = 'Project'::text) AND (routes.source_id = 278964))
         Buffers: shared read=5
         I/O Timings: read=13.681

Load latest successful pipeline with security reports

After loading the project & route entries, the task tries to load the latest successful pipeline ID with security reports to load the artifact entry later based on.

SELECT
    "ci_pipelines"."id"
FROM
    "ci_pipelines"
WHERE ci_pipelines.project_id = 278964
AND ci_pipelines.ref = 'master'
AND ci_pipelines.status IN ('success')
AND (EXISTS (
        SELECT
            1
        FROM
            "ci_builds"
        WHERE
            "ci_builds"."type" = 'Ci::Build'
            AND ("ci_builds"."retried" IS FALSE
                OR "ci_builds"."retried" IS NULL)
            AND (EXISTS (
                    SELECT
                        1
                    FROM
                        "ci_job_artifacts"
                    WHERE (ci_builds.id = ci_job_artifacts.job_id)
                    AND "ci_job_artifacts"."file_type" IN (5, 6, 7, 8, 21, 23)))
            AND (ci_pipelines.id = ci_builds.commit_id)))
ORDER BY
    "ci_pipelines"."id" DESC
LIMIT 1

Limit  (cost=1.84..23444.68 rows=1 width=4) (actual time=0.927..0.927 rows=1 loops=1)
   Buffers: shared hit=959
   ->  Nested Loop Semi Join  (cost=1.84..25599582.66 rows=1092 width=4) (actual time=0.926..0.926 rows=1 loops=1)
         Buffers: shared hit=959
         ->  Index Only Scan using index_ci_pipelines_on_project_id_and_ref_and_status_and_id on public.ci_pipelines  (cost=0.57..2948.65 rows=79343 width=4) (actual time=0.040..0.040 rows=1 loops=1)
               Index Cond: ((ci_pipelines.project_id = 278964) AND (ci_pipelines.ref = 'master'::text) AND (ci_pipelines.status = 'success'::text))
               Heap Fetches: 1
               Buffers: shared hit=6
         ->  Nested Loop Semi Join  (cost=1.27..322.60 rows=1 width=4) (actual time=0.884..0.884 rows=1 loops=1)
               Buffers: shared hit=953
               ->  Index Scan using index_ci_builds_on_commit_id_and_status_and_type on public.ci_builds  (cost=0.70..102.17 rows=113 width=8) (actual time=0.016..0.215 rows=121 loops=1)
                     Index Cond: ((ci_builds.commit_id = ci_pipelines.id) AND ((ci_builds.type)::text = 'Ci::Build'::text))
                     Filter: ((ci_builds.retried IS FALSE) OR (ci_builds.retried IS NULL))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=129
               ->  Index Only Scan using index_ci_job_artifacts_on_job_id_and_file_type on public.ci_job_artifacts  (cost=0.57..1.94 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=121)
                     Index Cond: (ci_job_artifacts.job_id = ci_builds.id)
                     Heap Fetches: 427
                     Filter: (ci_job_artifacts.file_type = ANY ('{5,6,7,8,21,23}'::integer[]))
                     Rows Removed by Filter: 4
                     Buffers: shared hit=824

Update disappeared vulnerabilities

UPDATE
    "vulnerabilities"
SET
    "resolved_on_default_branch" = true
WHERE
    "vulnerabilities"."project_id" = $project_id
    AND "vulnerabilities"."id" NOT IN (
        SELECT
            "vulnerabilities"."id"
        FROM
            "vulnerabilities"
            INNER JOIN vulnerability_occurrences vo ON vo.vulnerability_id = vulnerabilities.id
            INNER JOIN vulnerability_occurrence_pipelines vop ON vop.occurrence_id = vo.id
                AND vop.pipeline_id = 158
        WHERE
            "vulnerabilities"."project_id" = $project_id)

 ModifyTable on public.vulnerabilities  (cost=3631.71..9330.49 rows=1933 width=295) (actual time=1514.764..1514.764 rows=0 loops=1)
   Buffers: shared hit=139381 read=1472 dirtied=1552
   I/O Timings: read=1274.050
   ->  Index Scan using index_vulnerabilities_on_project_id on public.vulnerabilities  (cost=3631.71..9330.49 rows=1933 width=295) (actual time=0.065..46.498 rows=3974 loops=1)
         Index Cond: (vulnerabilities.project_id = 278964)
         Filter: (NOT (hashed SubPlan 1))
         Rows Removed by Filter: 0
         Buffers: shared hit=4542 dirtied=78
         SubPlan 1
           ->  Nested Loop  (cost=1.43..3631.28 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)
                 Buffers: shared hit=4
                 ->  Nested Loop  (cost=1.00..3192.98 rows=914 width=8) (actual time=0.010..0.010 rows=0 loops=1)
                       Buffers: shared hit=4
                       ->  Index Scan using index_vulnerability_occurrence_pipelines_on_pipeline_id on public.vulnerability_occurrence_pipelines vop  (cost=0.57..67.47 rows=914 width=8) (actual time=0.009..0.009 rows=0 loops=1)
                             Index Cond: (vop.pipeline_id = 158)
                             Buffers: shared hit=4
                       ->  Index Scan using vulnerability_occurrences_pkey on public.vulnerability_occurrences vo  (cost=0.43..3.42 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=0)
                             Index Cond: (vo.id = vop.occurrence_id)
                 ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities vulnerabilities_1  (cost=0.43..0.48 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=0)
                       Index Cond: (vulnerabilities_1.id = vo.vulnerability_id)
                       Filter: (vulnerabilities_1.project_id = 278964)
                       Rows Removed by Filter: 0

Expected timing

explain select distinct project_id from vulnerabilities

6590 project ids.

Each migration for a project takes ~1.5 seconds which means each enqueued job will take ~2mins but the gap between the jobs is 5mins just to make sure they don't overlap. Based on this information, ETA is as following;

(6590 / 100) * 5 minutes ~= 330 minutes to finish the migration for all projects.

Related to #227114 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

[-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
[-] Tested in all supported browsers
[-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

[-] Label as security and @ mention @gitlab-com/gl-security/appsec
[-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
[-] Security reports checked/validated by a reviewer from the AppSec team

Edited Sep 02, 2020 by Mehmet Emin INAC

Populate `resolved_on_default_branch` column for existing vulnerabilities