Skip to content

Denormalize `ci_builds` columns into `security_scans`

Subashis Chakraborty requested to merge 333414-add-columns-from-ci-builds into master

What does this MR do?

Related to #333414 (closed) #336198 (closed)

This MR adds project_id and pipeline_id columns to security_scans table. This also backfills the data from ci_builds for these columns.

The goal of this denormalization is to avoid joins that would have had to cross databases after sharding.

Subsequent MR

This MR will change several queries which using joins with ci_builds. These changes will come as part of this #337453 (closed). I opened an MR for this !67358 (diffs) which will take care if the changes in 14.3.

Migrations:

Up

== 20210722151951 AddColumnsToSecurityScans: migrating ========================
-- add_column(:security_scans, :project_id, :bigint)
   -> 0.0016s
-- add_column(:security_scans, :pipeline_id, :bigint)
   -> 0.0008s
== 20210722151951 AddColumnsToSecurityScans: migrated (0.0111s) ===============

== 20210728174349 AddFkToSecurityScansColumns: migrating ======================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:security_scans, :project_id, {:algorithm=>:concurrently})
   -> 0.0040s
-- execute("SET statement_timeout TO 0")
   -> 0.0007s
-- add_index(:security_scans, :project_id, {:algorithm=>:concurrently})
   -> 0.0068s
-- execute("RESET ALL")
   -> 0.0007s
-- transaction_open?()
   -> 0.0000s
-- foreign_keys(:security_scans)
   -> 0.0037s
-- execute("ALTER TABLE security_scans\nADD CONSTRAINT fk_dbc89265b9\nFOREIGN KEY (project_id)\nREFERENCES projects (id)\nON DELETE CASCADE\nNOT VALID;\n")
   -> 0.0089s
-- execute("ALTER TABLE security_scans VALIDATE CONSTRAINT fk_dbc89265b9;")
   -> 0.0121s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:security_scans, :pipeline_id, {:algorithm=>:concurrently})
   -> 0.0024s
-- add_index(:security_scans, :pipeline_id, {:algorithm=>:concurrently})
   -> 0.0025s
== 20210728174349 AddFkToSecurityScansColumns: migrated (0.0577s) =============

== 20210811214811 ScheduleCopyCiBuildsColumnsToSecurityScans: migrating =======
-- Scheduled 1 CopyCiBuildsColumnsToSecurityScans jobs with a maximum of 5000 records per batch and an interval of 120 seconds.

The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2021-08-11 21:57:10 UTC."
== 20210811214811 ScheduleCopyCiBuildsColumnsToSecurityScans: migrated (0.0511s)

Down

== 20210722151951 AddColumnsToSecurityScans: reverting ========================
-- remove_column(:security_scans, :project_id, :bigint)
   -> 0.0013s
-- remove_column(:security_scans, :pipeline_id, :bigint)
   -> 0.0007s
== 20210722151951 AddColumnsToSecurityScans: reverted (0.0096s) ===============


== 20210728174349 AddFkToSecurityScansColumns: reverting ======================
-- remove_foreign_key(:security_scans, {:column=>:project_id})
   -> 0.0052s
-- transaction_open?()
   -> 0.0000s
-- indexes(:security_scans)
   -> 0.0031s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:security_scans, {:algorithm=>:concurrently, :name=>"index_security_scans_on_project_id"})
   -> 0.0028s
-- execute("RESET ALL")
   -> 0.0005s
-- transaction_open?()
   -> 0.0000s
-- indexes(:security_scans)
   -> 0.0018s
-- remove_index(:security_scans, {:algorithm=>:concurrently, :name=>"index_security_scans_on_pipeline_id"})
   -> 0.0019s
== 20210728174349 AddFkToSecurityScansColumns: reverted (0.0184s) =============

== 20210811214811 ScheduleCopyCiBuildsColumnsToSecurityScans: reverting =======
== 20210811214811 ScheduleCopyCiBuildsColumnsToSecurityScans: reverted (0.0000s) 

Execution plan and timings

ModifyTable on public.security_scans  (cost=1.01..1764.69 rows=459 width=72) (actual time=780.656..780.658 rows=0 loops=1)
   Buffers: shared hit=15040 read=704 dirtied=73 written=14
   I/O Timings: read=747.815 write=0.000
   ->  Nested Loop  (cost=1.01..1764.69 rows=459 width=72) (actual time=5.214..750.831 rows=500 loops=1)
         Buffers: shared hit=1907 read=684 dirtied=1
         I/O Timings: read=742.657 write=0.000
         ->  Index Scan using security_scans_pkey on public.security_scans  (cost=0.43..42.79 rows=478 width=46) (actual time=0.161..3.720 rows=500 loops=1)
               Index Cond: ((security_scans.id >= 4636) AND (security_scans.id <= 5141))
               Buffers: shared hit=50 read=38
               I/O Timings: read=2.981 write=0.000
         ->  Index Scan using ci_builds_pkey on public.ci_builds  (cost=0.58..3.60 rows=1 width=18) (actual time=1.492..1.492 rows=1 loops=500)
               Index Cond: (ci_builds.id = security_scans.build_id)
               Filter: ((ci_builds.type)::text = 'Ci::Build'::text)
               Rows Removed by Filter: 0
               Buffers: shared hit=1857 read=646 dirtied=1
               I/O Timings: read=739.676 write=0.000

Estimated execution time

Records in table: 9_460_892
Batch size: 5_000
Batches: 1893 (every batch will have 10 updates, 500 records in each update)
Delay: 120s

(1893 * 0.866 * 10) + (1893 * 120) = 16394 + 227160 = 243554 seconds ~= 68 hours ~= 3 days
Edited by Subashis Chakraborty

Merge request reports