Skip to content

Add partition number to security findings query

What does this MR do and why?

Add partition number to security findings query

All records for a particular pipeline are guaranteed to be on the same partition in the security_findings table so we can add the partition number as a parameter when querying the table for a pipeline.

See https://gitlab.com/gitlab-org/gitlab/-/blob/73c6eb71d7a05f45eea772135c6c0b1c117e0d2c/ee/app/models/ee/ci/pipeline.rb#L188-197

      # EE::Ci::Pipeline

      # We want all the `security_findings` records for a particular pipeline to be stored in
      # the same partition, therefore, we check if the pipeline already has a `security_scan`.
      #
      # - If it has, then we use the partition number of the existing security_scan to make sure
      # that the new `security_findings` will be stored in the same partition with the existing ones.
      # - If it does not have a security_scan yet, then we can basically use the latest partition
      # of the `security_findings` table.
      def security_findings_partition_number
        @security_findings_partition_number ||= security_scans.first&.findings_partition_number || Security::Finding.active_partition_number
      end

This change adds:

... FROM security_findings WHERE security_findings.partition_number = n

to the query in Security::FindingsFinder.

As a result of this the query only needs to visit one partition. Without this clause all active partitions are scanner - 12 at the time of writing.

Note: This change does not drastically change the buffer read or time performance of the query. But it does dramatically reduce the number of locks taken. I have not listed them here, they can be seen through the postgres.ai links, but the number of AccessShareLocks on the query drops from 168 before to 60 after.

Plan before

explain SELECT
    "security_findings".*
FROM
    "security_scans",
    unnest('{1,2,4,5,6,7}'::smallint[]) AS "severities" ("severity"),
    LATERAL (
        SELECT
            "security_findings".*
        FROM
            "security_findings"
        WHERE ("security_findings"."scan_id" = "security_scans"."id")
        AND ("security_findings"."severity" = "severities"."severity")
        AND "security_findings"."deduplicated" = TRUE
        AND "security_findings"."confidence" IN (1, 2, 3, 4, 5, 6, 7)
        AND (NOT EXISTS (
                SELECT
                    1
                FROM
                    "vulnerabilities"
                    INNER JOIN "vulnerability_occurrences" ON "vulnerability_occurrences"."vulnerability_id" = "vulnerabilities"."id"
                WHERE
                    "vulnerabilities"."state" = 2
                    AND (vulnerability_occurrences.uuid::uuid = security_findings.uuid::uuid)))
        ORDER BY
            "security_findings"."severity" DESC,
            "security_findings"."id" ASC
        LIMIT 20) AS "security_findings"
WHERE
    "security_scans"."pipeline_id" = 967613589
    AND "security_scans"."latest" = TRUE
    AND "security_scans"."status" = 1
    AND "security_scans"."scan_type" IN (1, 2, 3, 4, 5, 6, 7)
ORDER BY
    "security_findings"."severity" DESC,
    "security_findings"."id" ASC
LIMIT 20 OFFSET 0
 Limit  (cost=1498.31..1498.36 rows=20 width=1153) (actual time=26.516..26.523 rows=20 loops=1)
   Buffers: shared hit=9160
   I/O Timings: read=0.000 write=0.000
   ->  Sort  (cost=1498.31..1498.61 rows=120 width=1153) (actual time=26.515..26.520 rows=20 loops=1)
         Sort Key: security_findings.severity DESC, security_findings.id
         Sort Method: top-N heapsort  Memory: 66kB
         Buffers: shared hit=9160
         I/O Timings: read=0.000 write=0.000
         ->  Nested Loop  (cost=9.32..1495.12 rows=120 width=1153) (actual time=22.880..26.373 rows=80 loops=1)
               Buffers: shared hit=9157
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_security_scans_on_pipeline_id_and_scan_type on public.security_scans  (cost=0.57..7.63 rows=1 width=8) (actual time=0.026..0.028 rows=1 loops=1)
                     Index Cond: (security_scans.pipeline_id = 967613589)
                     Filter: (security_scans.latest AND (security_scans.status = 1) AND (security_scans.scan_type = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=8
                     I/O Timings: read=0.000 write=0.000
               ->  Nested Loop  (cost=8.75..1486.28 rows=120 width=1153) (actual time=22.851..26.323 rows=80 loops=1)
                     Buffers: shared hit=9149
                     I/O Timings: read=0.000 write=0.000
                     ->  Function Scan on unnest severities  (cost=0.00..0.06 rows=6 width=2) (actual time=0.009..0.013 rows=6 loops=1)
                           I/O Timings: read=0.000 write=0.000
                     ->  Limit  (cost=8.75..247.30 rows=20 width=1153) (actual time=4.194..4.381 rows=13 loops=6)
                           Buffers: shared hit=9149
                           I/O Timings: read=0.000 write=0.000
                           ->  Nested Loop Anti Join  (cost=8.75..26822.08 rows=2248 width=1153) (actual time=4.193..4.378 rows=13 loops=6)
                                 Buffers: shared hit=9149
                                 I/O Timings: read=0.000 write=0.000
                                 ->  Merge Append  (cost=7.62..15525.03 rows=2427 width=1153) (actual time=4.167..4.175 rows=13 loops=6)
                                       Sort Key: security_findings.id
                                       Buffers: shared hit=8399
                                       I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_69_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_69 security_findings_1  (cost=0.57..1344.60 rows=212 width=1144) (actual time=0.007..0.007 rows=0 loops=6)
                                             Index Cond: (security_findings_1.scan_id = security_scans.id)
                                             Filter: (security_findings_1.deduplicated AND (security_findings_1.severity = severities.severity) AND (security_findings_1.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_70_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_70 security_findings_2  (cost=0.57..1366.13 rows=217 width=1116) (actual time=0.005..0.005 rows=0 loops=6)
                                             Index Cond: (security_findings_2.scan_id = security_scans.id)
                                             Filter: (security_findings_2.deduplicated AND (security_findings_2.severity = severities.severity) AND (security_findings_2.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_71_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_71 security_findings_3  (cost=0.57..1247.04 rows=198 width=1106) (actual time=0.007..0.007 rows=0 loops=6)
                                             Index Cond: (security_findings_3.scan_id = security_scans.id)
                                             Filter: (security_findings_3.deduplicated AND (security_findings_3.severity = severities.severity) AND (security_findings_3.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_72_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_72 security_findings_4  (cost=0.57..1218.64 rows=192 width=1115) (actual time=4.088..4.094 rows=13 loops=6)
                                             Index Cond: (security_findings_4.scan_id = security_scans.id)
                                             Filter: (security_findings_4.deduplicated AND (security_findings_4.severity = severities.severity) AND (security_findings_4.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 9756
                                             Buffers: shared hit=8111
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_73_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_73 security_findings_5  (cost=0.57..1155.55 rows=182 width=1127) (actual time=0.007..0.007 rows=0 loops=6)
                                             Index Cond: (security_findings_5.scan_id = security_scans.id)
                                             Filter: (security_findings_5.deduplicated AND (security_findings_5.severity = severities.severity) AND (security_findings_5.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_74_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_74 security_findings_6  (cost=0.57..1251.21 rows=196 width=1154) (actual time=0.005..0.005 rows=0 loops=6)
                                             Index Cond: (security_findings_6.scan_id = security_scans.id)
                                             Filter: (security_findings_6.deduplicated AND (security_findings_6.severity = severities.severity) AND (security_findings_6.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_75_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_75 security_findings_7  (cost=0.57..1238.34 rows=193 width=1175) (actual time=0.005..0.005 rows=0 loops=6)
                                             Index Cond: (security_findings_7.scan_id = security_scans.id)
                                             Filter: (security_findings_7.deduplicated AND (security_findings_7.severity = severities.severity) AND (security_findings_7.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_76_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_76 security_findings_8  (cost=0.57..1294.98 rows=202 width=1174) (actual time=0.007..0.007 rows=0 loops=6)
                                             Index Cond: (security_findings_8.scan_id = security_scans.id)
                                             Filter: (security_findings_8.deduplicated AND (security_findings_8.severity = severities.severity) AND (security_findings_8.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_77_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_77 security_findings_9  (cost=0.57..1230.44 rows=191 width=1190) (actual time=0.006..0.006 rows=0 loops=6)
                                             Index Cond: (security_findings_9.scan_id = security_scans.id)
                                             Filter: (security_findings_9.deduplicated AND (security_findings_9.severity = severities.severity) AND (security_findings_9.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_78_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_78 security_findings_10  (cost=0.57..1246.73 rows=194 width=1196) (actual time=0.006..0.006 rows=0 loops=6)
                                             Index Cond: (security_findings_10.scan_id = security_scans.id)
                                             Filter: (security_findings_10.deduplicated AND (security_findings_10.severity = severities.severity) AND (security_findings_10.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_79_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_79 security_findings_11  (cost=0.57..1154.09 rows=179 width=1209) (actual time=0.004..0.004 rows=0 loops=6)
                                             Index Cond: (security_findings_11.scan_id = security_scans.id)
                                             Filter: (security_findings_11.deduplicated AND (security_findings_11.severity = severities.severity) AND (security_findings_11.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_80_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_80 security_findings_12  (cost=0.57..1013.73 rows=161 width=1133) (actual time=0.006..0.006 rows=0 loops=6)
                                             Index Cond: (security_findings_12.scan_id = security_scans.id)
                                             Filter: (security_findings_12.deduplicated AND (security_findings_12.severity = severities.severity) AND (security_findings_12.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using security_findings_81_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_81 security_findings_13  (cost=0.56..706.29 rows=110 width=1154) (actual time=0.005..0.005 rows=0 loops=6)
                                             Index Cond: (security_findings_13.scan_id = security_scans.id)
                                             Filter: (security_findings_13.deduplicated AND (security_findings_13.severity = severities.severity) AND (security_findings_13.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                             Rows Removed by Filter: 0
                                             Buffers: shared hit=24
                                             I/O Timings: read=0.000 write=0.000
                                 ->  Nested Loop  (cost=1.14..4.64 rows=1 width=16) (actual time=0.015..0.015 rows=0 loops=80)
                                       Buffers: shared hit=750
                                       I/O Timings: read=0.000 write=0.000
                                       ->  Index Only Scan using index_vuln_findings_on_uuid_including_vuln_id_1 on public.vulnerability_occurrences  (cost=0.57..2.11 rows=1 width=24) (actual time=0.009..0.009 rows=1 loops=80)
                                             Index Cond: (vulnerability_occurrences.uuid = security_findings.uuid)
                                             Heap Fetches: 2
                                             Buffers: shared hit=350
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.57..2.53 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=80)
                                             Index Cond: (vulnerabilities.id = vulnerability_occurrences.vulnerability_id)
                                             Filter: (vulnerabilities.state = 2)
                                             Rows Removed by Filter: 1
                                             Buffers: shared hit=400
                                             I/O Timings: read=0.000 write=0.000
Time: 60.193 ms
  - planning: 33.267 ms
  - execution: 26.926 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 9160 (~71.60 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/23297/commands/74901

Plan after

SELECT
    "security_findings".*
FROM
    "security_scans",
    unnest('{1,2,4,5,6,7}'::smallint[]) AS "severities" ("severity"),
    LATERAL (
        SELECT
            "security_findings".*
        FROM
            "security_findings"
        WHERE ("security_findings"."scan_id" = "security_scans"."id")
        AND ("security_findings"."severity" = "severities"."severity")
        AND "security_findings"."deduplicated" = TRUE
        AND "security_findings"."partition_number" = 72
        AND "security_findings"."confidence" IN (1, 2, 3, 4, 5, 6, 7)
        AND (NOT EXISTS (
                SELECT
                    1
                FROM
                    "vulnerabilities"
                    INNER JOIN "vulnerability_occurrences" ON "vulnerability_occurrences"."vulnerability_id" = "vulnerabilities"."id"
                WHERE
                    "vulnerabilities"."state" = 2
                    AND (vulnerability_occurrences.uuid::uuid = security_findings.uuid::uuid)))
        ORDER BY
            "security_findings"."severity" DESC,
            "security_findings"."id" ASC
        LIMIT 20) AS "security_findings"
WHERE
    "security_scans"."pipeline_id" = 967613589
    AND "security_scans"."latest" = TRUE
    AND "security_scans"."status" = 1
    AND "security_scans"."scan_type" IN (1, 2, 3, 4, 5, 6, 7)
ORDER BY
    "security_findings"."severity" DESC,
    "security_findings"."id" ASC
LIMIT 20 OFFSET 0
 Limit  (cost=1204.18..1204.23 rows=20 width=1115) (actual time=32.620..32.626 rows=20 loops=1)
   Buffers: shared hit=8875
   I/O Timings: read=0.000 write=0.000
   ->  Sort  (cost=1204.18..1204.48 rows=120 width=1115) (actual time=32.618..32.622 rows=20 loops=1)
         Sort Key: security_findings.severity DESC, security_findings.id
         Sort Method: top-N heapsort  Memory: 66kB
         Buffers: shared hit=8875
         I/O Timings: read=0.000 write=0.000
         ->  Nested Loop  (cost=2.27..1200.99 rows=120 width=1115) (actual time=28.406..32.442 rows=80 loops=1)
               Buffers: shared hit=8869
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_security_scans_on_pipeline_id_and_scan_type on public.security_scans  (cost=0.57..7.63 rows=1 width=8) (actual time=0.036..0.038 rows=1 loops=1)
                     Index Cond: (security_scans.pipeline_id = 967613589)
                     Filter: (security_scans.latest AND (security_scans.status = 1) AND (security_scans.scan_type = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=8
                     I/O Timings: read=0.000 write=0.000
               ->  Nested Loop  (cost=1.71..1192.15 rows=120 width=1115) (actual time=28.366..32.379 rows=80 loops=1)
                     Buffers: shared hit=8861
                     I/O Timings: read=0.000 write=0.000
                     ->  Function Scan on unnest severities  (cost=0.00..0.06 rows=6 width=2) (actual time=0.029..0.036 rows=6 loops=1)
                           I/O Timings: read=0.000 write=0.000
                     ->  Limit  (cost=1.70..198.28 rows=20 width=1115) (actual time=5.182..5.384 rows=13 loops=6)
                           Buffers: shared hit=8861
                           I/O Timings: read=0.000 write=0.000
                           ->  Nested Loop Anti Join  (cost=1.70..1751.26 rows=178 width=1115) (actual time=5.181..5.381 rows=13 loops=6)
                                 Buffers: shared hit=8861
                                 I/O Timings: read=0.000 write=0.000
                                 ->  Index Scan using security_findings_72_scan_id_id_idx on gitlab_partitions_dynamic.security_findings_72 security_findings  (cost=0.57..1221.60 rows=192 width=1115) (actual time=5.147..5.154 rows=13 loops=6)
                                       Index Cond: (security_findings.scan_id = security_scans.id)
                                       Filter: (security_findings.deduplicated AND (security_findings.severity = severities.severity) AND (security_findings.partition_number = 72) AND (security_findings.confidence = ANY ('{1,2,3,4,5,6,7}'::integer[])))
                                       Rows Removed by Filter: 9756
                                       Buffers: shared hit=8111
                                       I/O Timings: read=0.000 write=0.000
                                 ->  Nested Loop  (cost=1.14..2.75 rows=1 width=16) (actual time=0.016..0.016 rows=0 loops=80)
                                       Buffers: shared hit=750
                                       I/O Timings: read=0.000 write=0.000
                                       ->  Index Only Scan using index_vuln_findings_on_uuid_including_vuln_id_1 on public.vulnerability_occurrences  (cost=0.57..2.12 rows=1 width=24) (actual time=0.010..0.010 rows=1 loops=80)
                                             Index Cond: (vulnerability_occurrences.uuid = security_findings.uuid)
                                             Heap Fetches: 2
                                             Buffers: shared hit=350
                                             I/O Timings: read=0.000 write=0.000
                                       ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.57..0.63 rows=1 width=8) (actual time=0.006..0.006 rows=0 loops=80)
                                             Index Cond: (vulnerabilities.id = vulnerability_occurrences.vulnerability_id)
                                             Filter: (vulnerabilities.state = 2)
                                             Rows Removed by Filter: 1
                                             Buffers: shared hit=400
                                             I/O Timings: read=0.000 write=0.000
Time: 46.054 ms
  - planning: 13.200 ms
  - execution: 32.854 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 8875 (~69.30 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/23297/commands/74902

Related to #381302

Edited by Malcolm Locke

Merge request reports