Skip to content

Remove ci_builds join from Security::Finding.by_build_ids

drew stachon requested to merge remove-ci-join-from-security-finding into master

What does this MR do and why?

This MR removes a join to the ci_builds table from security_scans because it doesn't appear to be needed.

We only use the primary key from ci_builds in our query, so we can query build_id directly on security_scans.

This query is a problem for two Engineering Allocation issues I'm working on: https://gitlab.com/gitlab-org/gitlab/-/issues/24644 (In progress: !71342 (comment 689569275)) and #340256 (closed)

It'll also be a ~"sharding-blocker", either for &6379 (closed) or &6373 (closed). Either way it seems like a nontrivial ~performance improvement as well.

Current Query

scope :by_build_ids, -> (build_ids) { joins(scan: :build).where(ci_builds: { id: build_ids }) }
SELECT "security_findings".* FROM "security_findings" INNER JOIN "security_scans" ON "security_scans"."id" = "security_findings"."scan_id" INNER JOIN "ci_builds" ON "ci_builds"."id" = "security_scans"."build_id" AND "ci_builds"."type" = 'Ci::Build' WHERE "ci_builds"."id" IN (1, 2, 3)

Execution plan: https://gitlab.slack.com/archives/CLJMDRD8C/p1633126596130300

 Nested Loop  (cost=1.71..2040.23 rows=1 width=106) (actual time=5.960..5.962 rows=0 loops=1)
   Buffers: shared hit=24 read=4
   I/O Timings: read=5.876 write=0.000
   ->  Nested Loop  (cost=1.14..19.75 rows=1 width=8) (actual time=5.960..5.961 rows=0 loops=1)
         Buffers: shared hit=24 read=4
         I/O Timings: read=5.876 write=0.000
         ->  Index Scan using ci_builds_pkey on public.ci_builds  (cost=0.58..8.99 rows=3 width=8) (actual time=5.931..5.940 rows=3 loops=1)
               Index Cond: (ci_builds.id = ANY ('{1,2,3}'::bigint[]))
               Filter: ((ci_builds.type)::text = 'Ci::Build'::text)
               Rows Removed by Filter: 0
               Buffers: shared hit=12 read=4
               I/O Timings: read=5.876 write=0.000
         ->  Index Scan using idx_security_scans_on_build_and_scan_type on public.security_scans  (cost=0.56..3.58 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=3)
               Index Cond: (security_scans.build_id = ci_builds.id)
               Buffers: shared hit=12
               I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_security_findings_on_scan_id_and_position on public.security_findings  (cost=0.57..1995.17 rows=2531 width=106) (actual time=0.000..0.000 rows=0 loops=0)
         Index Cond: (security_findings.scan_id = security_scans.id)
         I/O Timings: read=0.000 write=0.000

New Query

scope :by_build_ids, -> (build_ids) { joins(:scan).where(security_scans: { build_id: build_ids }) }
SELECT "security_findings".* FROM "security_findings" INNER JOIN "security_scans" ON "security_scans"."id" = "security_findings"."scan_id" WHERE "security_scans"."build_id" IN (1, 2, 3)

Execution plan: https://gitlab.slack.com/archives/CLJMDRD8C/p1633126613133300

 Nested Loop  (cost=1.13..7506.35 rows=92 width=106) (actual time=0.053..0.054 rows=0 loops=1)
   Buffers: shared hit=15
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using idx_security_scans_on_build_and_scan_type on public.security_scans  (cost=0.56..10.00 rows=3 width=8) (actual time=0.051..0.052 rows=0 loops=1)
         Index Cond: (security_scans.build_id = ANY ('{1,2,3}'::bigint[]))
         Buffers: shared hit=15
         I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_security_findings_on_scan_id_and_position on public.security_findings  (cost=0.57..2473.48 rows=2531 width=106) (actual time=0.000..0.000 rows=0 loops=0)
         Index Cond: (security_findings.scan_id = security_scans.id)
         I/O Timings: read=0.000 write=0.000

How to set up and validate locally

This is more of a validate-in-database-lab kind of thing, since the production database is where the effect of this will be seen.

I haven't changed any specs, so there shouldn't be any behavioral differences.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by drew stachon

Merge request reports