Draft: PoC for removing partitioned data using foreign keys
What does this MR do and why?
Related to #358943 (closed)
This partitions all of the tables described in https://docs.gitlab.com/ee/architecture/blueprints/ci_data_decay/pipeline_partitioning.html#why-do-we-need-to-partition-cicd-data and creates 5 partitions for each. It also adds some horrible hacks to allow the execution of Ci::CreatePipelineService
for data insertion:
# Configure which partition to use for data storage: 100..105
[1] pry(main)> Thread.current[:current_partition_id] = 103
=> 103
[2] pry(main)> Ci::CreatePipelineService.new(Project.find(75), User.first, { ref: 'main', variables_attributes: [ {key: 'GIT_STRATEGY', value: 'none' } ] }).execute!(:web, {})
=> #<ServiceResponse:0x00007ff272967960
@http_status=:ok,
@message=nil,
@payload=
#<Ci::Pipeline:0x00007ff286766300
id: 544,
ref: "main",
sha: "bc47a984eb5f8b5712a420ac1c043396e05f973f",
before_sha: "0000000000000000000000000000000000000000",
created_at: Thu, 14 Jul 2022 12:44:39.769820000 UTC +00:00,
updated_at: Thu, 14 Jul 2022 12:44:39.769820000 UTC +00:00,
tag: false,
yaml_errors: nil,
committed_at: nil,
project_id: 75,
status: "created",
started_at: nil,
finished_at: nil,
duration: nil,
user_id: 1,
lock_version: 0,
auto_canceled_by_id: nil,
pipeline_schedule_id: nil,
source: "web",
config_source: "repository_source",
protected: true,
failure_reason: nil,
iid: 57,
merge_request_id: nil,
source_sha: nil,
target_sha: nil,
external_pull_request_id: nil,
ci_ref_id: 73,
locked: "artifacts_locked",
partition_id: 103>,
@status=:success>
Data is inserted in the right partition:
gitlabhq_development_ci=# select id, iid, partition_id, project_id from p_ci_pipelines_103;
id | iid | partition_id | project_id
-----+-----+--------------+------------
520 | 33 | 103 | 75
521 | 34 | 103 | 75
522 | 35 | 103 | 75
544 | 57 | 103 | 75
(4 rows)
gitlabhq_development_ci=# select id, name, commit_id, partition_id from p_ci_builds_103;
id | name | commit_id | partition_id
-------+---------+-----------+--------------
26751 | build1 | 520 | 103
26752 | test1 | 520 | 103
26753 | test2 | 520 | 103
26754 | deploy1 | 520 | 103
26755 | build1 | 521 | 103
26756 | test1 | 521 | 103
26757 | test2 | 521 | 103
26758 | deploy1 | 521 | 103
26759 | build1 | 522 | 103
26760 | test1 | 522 | 103
26761 | test2 | 522 | 103
26762 | deploy1 | 522 | 103
26847 | build1 | 544 | 103
26848 | test1 | 544 | 103
26849 | test2 | 544 | 103
26850 | deploy1 | 544 | 103
(16 rows)
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Merge request reports
Activity
changed milestone to %15.5
assigned to @mbobin
- A deleted user
added database databasereview pending labels
1 Error CHANGELOG missing: To create a changelog entry, add the
Changelog
trailer to one of your Git commit messages.This merge request requires a changelog entry because it introduces a database migration.
4 Warnings c9ffe315: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. ab0f06b8: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. New migrations added but db/structure.sql wasn't updated Usually, when adding new migrations, db/structure.sql should be
updated too (unless the migration isn't changing the DB schema
and isn't the most recent one).You've made some app changes, but didn't add any tests.
That's OK as long as you're refactoring existing code,
but please consider adding any of the maintenancepipelines, maintenanceworkflow, documentation, QA labels.1 Message This merge request adds or changes files that require a review from the Database team. This merge request requires a database review. To make sure these changes are reviewed, take the following steps:
-
Ensure the merge request has database and databasereview pending labels. If the merge request modifies database files, Danger will do this for you.
-
Prepare your MR for database review according to the docs.
-
Assign and mention the database reviewer suggested by Reviewer Roulette.
-
If this is not a Community contribution or from a Fork, kick off the
db:gitlabcom-database-testing
manual job.
The following files require a review from the Database team:
db/migrate/20220714081515_partition_ci_pipelines.rb
db/migrate/20220714081824_partition_ci_pipelines_config.rb
db/migrate/20220714083150_partition_ci_builds.rb
db/migrate/20220714084050_partition_ci_builds_metadata.rb
db/migrate/20220714084531_partition_ci_job_artifacts.rb
db/migrate/20220714084829_partition_ci_stages.rb
db/migrate/20220714085021_partition_ci_pipeline_variables.rb
db/migrate/20220714103547_create_ci_partitions.rb
db/schema_migrations/20220714081515
db/schema_migrations/20220714081824
db/schema_migrations/20220714083150
db/schema_migrations/20220714084050
db/schema_migrations/20220714084531
db/schema_migrations/20220714084829
db/schema_migrations/20220714085021
db/schema_migrations/20220714103547
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Tarun Vellishetty ( @tvellishetty
) (UTC+5.5, 2.5 hours ahead of@mbobin
)Pavel Shutsin ( @pshutsin
) (UTC+2, 1 hour behind@mbobin
)database Maxime Orefice ( @morefice
) (UTC+2, 1 hour behind@mbobin
)João Alexandre Prado Tavares Cunha ( @Alexand
) (UTC-3, 6 hours behind@mbobin
)~migration No reviewer available No maintainer available To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Danger-
removed database databasereview pending labels
338 350 scope :for_id, -> (id) { where(id: id) } 339 351 scope :for_iid, -> (iid) { where(iid: iid) } 340 352 scope :for_project, -> (project_id) { where(project_id: project_id) } 341 scope :created_after, -> (time) { where('ci_pipelines.created_at > ?', time) } 342 scope :created_before_id, -> (id) { where('ci_pipelines.id < ?', id) } 353 scope :created_after, -> (time) { where("#{quoted_table_name}.created_at > ?", time) } 354 scope :created_before_id, -> (id) { where("#{quoted_table_name}.id < ?", id) } MR opened: !92535 (merged)
I think we need to add indexes and FK constraints for each partition:
gitlabhq_development_ci=# \d+ p_ci_pipelines Partitioned table "public.p_ci_pipelines" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------------------+-----------------------------+-----------+----------+------------------------------------------+----------+--------------+------------- id | integer | | not null | nextval('ci_pipelines_id_seq'::regclass) | plain | | ref | character varying | | | | extended | | sha | character varying | | | | extended | | before_sha | character varying | | | | extended | | created_at | timestamp without time zone | | | | plain | | updated_at | timestamp without time zone | | | | plain | | tag | boolean | | | false | plain | | yaml_errors | text | | | | extended | | committed_at | timestamp without time zone | | | | plain | | project_id | integer | | | | plain | | status | character varying | | | | extended | | started_at | timestamp without time zone | | | | plain | | finished_at | timestamp without time zone | | | | plain | | duration | integer | | | | plain | | user_id | integer | | | | plain | | lock_version | integer | | | 0 | plain | | auto_canceled_by_id | integer | | | | plain | | pipeline_schedule_id | integer | | | | plain | | source | integer | | | | plain | | config_source | integer | | | | plain | | protected | boolean | | | | plain | | failure_reason | integer | | | | plain | | iid | integer | | | | plain | | merge_request_id | integer | | | | plain | | source_sha | bytea | | | | extended | | target_sha | bytea | | | | extended | | external_pull_request_id | bigint | | | | plain | | ci_ref_id | bigint | | | | plain | | locked | smallint | | not null | 1 | plain | | partition_id | bigint | | not null | 100 | plain | | Partition key: LIST (partition_id) Indexes: "p_ci_pipelines_id_partition_id" UNIQUE CONSTRAINT, btree (id, partition_id) Partitions: ci_pipelines FOR VALUES IN ('100'), p_ci_pipelines_101 FOR VALUES IN ('101'), p_ci_pipelines_102 FOR VALUES IN ('102'), p_ci_pipelines_103 FOR VALUES IN ('103'), p_ci_pipelines_104 FOR VALUES IN ('104'), p_ci_pipelines_105 FOR VALUES IN ('105') gitlabhq_development_ci=# \d+ p_ci_pipelines_103 Table "public.p_ci_pipelines_103" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------------------+-----------------------------+-----------+----------+------------------------------------------+----------+--------------+------------- id | integer | | not null | nextval('ci_pipelines_id_seq'::regclass) | plain | | ref | character varying | | | | extended | | sha | character varying | | | | extended | | before_sha | character varying | | | | extended | | created_at | timestamp without time zone | | | | plain | | updated_at | timestamp without time zone | | | | plain | | tag | boolean | | | false | plain | | yaml_errors | text | | | | extended | | committed_at | timestamp without time zone | | | | plain | | project_id | integer | | | | plain | | status | character varying | | | | extended | | started_at | timestamp without time zone | | | | plain | | finished_at | timestamp without time zone | | | | plain | | duration | integer | | | | plain | | user_id | integer | | | | plain | | lock_version | integer | | | 0 | plain | | auto_canceled_by_id | integer | | | | plain | | pipeline_schedule_id | integer | | | | plain | | source | integer | | | | plain | | config_source | integer | | | | plain | | protected | boolean | | | | plain | | failure_reason | integer | | | | plain | | iid | integer | | | | plain | | merge_request_id | integer | | | | plain | | source_sha | bytea | | | | extended | | target_sha | bytea | | | | extended | | external_pull_request_id | bigint | | | | plain | | ci_ref_id | bigint | | | | plain | | locked | smallint | | not null | 1 | plain | | partition_id | bigint | | not null | 100 | plain | | Partition of: p_ci_pipelines FOR VALUES IN ('103') Partition constraint: ((partition_id IS NOT NULL) AND (partition_id = '103'::bigint)) Indexes: "p_ci_pipelines_103_id_partition_id_key" UNIQUE CONSTRAINT, btree (id, partition_id) Access method: heap gitlabhq_development_ci=# select id, iid, partition_id, project_id from p_ci_pipelines_103; id | iid | partition_id | project_id -----+-----+--------------+------------ 520 | 33 | 103 | 75 521 | 34 | 103 | 75 522 | 35 | 103 | 75 544 | 57 | 103 | 75 (4 rows) gitlabhq_development_ci=# delete from p_ci_pipelines_103 where id = 544; DELETE 1 gitlabhq_development_ci=# select id, name, commit_id, partition_id from p_ci_builds_103 where commit_id = 544; id | name | commit_id | partition_id -------+---------+-----------+-------------- 26847 | build1 | 544 | 103 26848 | test1 | 544 | 103 26849 | test2 | 544 | 103 26850 | deploy1 | 544 | 103 (4 rows)
After adding the constraints, cascade deletion works:
gitlabhq_development_ci=# ALTER TABLE ONLY p_ci_pipelines ADD UNIQUE (id, partition_id); ALTER TABLE gitlabhq_development_ci=# ALTER TABLE p_ci_builds ADD CONSTRAINT fk_d3130c9a7f FOREIGN KEY (commit_id, partition_id) REFERENCES p_ci_pipelines(id, partition_id) ON DELETE CASCADE; ALTER TABLE
gitlabhq_development_ci=# select id, iid, partition_id, project_id from p_ci_pipelines_103; id | iid | partition_id | project_id -----+-----+--------------+------------ 520 | 33 | 103 | 75 521 | 34 | 103 | 75 522 | 35 | 103 | 75 (3 rows) gitlabhq_development_ci=# select id, name, commit_id, partition_id from p_ci_builds_103 where commit_id = 522; id | name | commit_id | partition_id -------+---------+-----------+-------------- 26759 | build1 | 522 | 103 26760 | test1 | 522 | 103 26761 | test2 | 522 | 103 26762 | deploy1 | 522 | 103 (4 rows) gitlabhq_development_ci=# delete from p_ci_pipelines_103 where id = 522; DELETE 1 gitlabhq_development_ci=# select id, name, commit_id, partition_id from p_ci_builds_103 where commit_id = 522; id | name | commit_id | partition_id ----+------+-----------+-------------- (0 rows)
Deleting a pipeline by
id
fromp_ci_pipelines
deletes the corresponding builds from the right partition:gitlabhq_development_ci=# select id, iid, partition_id, project_id from p_ci_pipelines_103; id | iid | partition_id | project_id -----+-----+--------------+------------ 520 | 33 | 103 | 75 521 | 34 | 103 | 75 (2 rows) gitlabhq_development_ci=# select id, name, commit_id, partition_id from p_ci_builds_103 where commit_id = 521; id | name | commit_id | partition_id -------+---------+-----------+-------------- 26755 | build1 | 521 | 103 26756 | test1 | 521 | 103 26757 | test2 | 521 | 103 26758 | deploy1 | 521 | 103 (4 rows) gitlabhq_development_ci=# delete from p_ci_pipelines where id = 521; DELETE 1 gitlabhq_development_ci=# select id, name, commit_id, partition_id from p_ci_builds_103 where commit_id = 521; id | name | commit_id | partition_id ----+------+-----------+-------------- (0 rows)
When the
partition_id
is not provided it goes though each partition and it triggers FK events when needed:gitlabhq_development_ci=# explain analyze delete from p_ci_pipelines where id = 518; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on p_ci_pipelines (cost=0.15..12.98 rows=6 width=6) (actual time=0.060..0.062 rows=0 loops=1) Delete on ci_pipelines Delete on p_ci_pipelines_101 Delete on p_ci_pipelines_102 Delete on p_ci_pipelines_103 Delete on p_ci_pipelines_104 Delete on p_ci_pipelines_105 -> Index Scan using ci_pipelines_pkey on ci_pipelines (cost=0.15..2.17 rows=1 width=6) (actual time=0.013..0.013 rows=0 loops=1) Index Cond: (id = 518) -> Index Scan using p_ci_pipelines_101_id_partition_id_key on p_ci_pipelines_101 (cost=0.14..2.16 rows=1 width=6) (actual time=0.005..0.006 rows=0 loops=1) Index Cond: (id = 518) -> Index Scan using p_ci_pipelines_102_id_partition_id_key on p_ci_pipelines_102 (cost=0.14..2.16 rows=1 width=6) (actual time=0.009..0.010 rows=1 loops=1) Index Cond: (id = 518) -> Index Scan using p_ci_pipelines_103_id_partition_id_key on p_ci_pipelines_103 (cost=0.14..2.16 rows=1 width=6) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (id = 518) -> Index Scan using p_ci_pipelines_104_id_partition_id_key on p_ci_pipelines_104 (cost=0.14..2.16 rows=1 width=6) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: (id = 518) -> Index Scan using p_ci_pipelines_105_id_partition_id_key on p_ci_pipelines_105 (cost=0.14..2.16 rows=1 width=6) (actual time=0.005..0.005 rows=0 loops=1) Index Cond: (id = 518) Planning Time: 0.580 ms Trigger for constraint p_ci_builds_commit_id_partition_id_fkey2 on p_ci_pipelines_102: time=0.198 calls=1 Execution Time: 0.327 ms (22 rows)
With
partition_id
:gitlabhq_development_ci=# explain analyze delete from p_ci_pipelines where id = 517 and partition_id = 102; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on p_ci_pipelines (cost=0.14..2.16 rows=1 width=6) (actual time=0.039..0.040 rows=0 loops=1) Delete on p_ci_pipelines_102 -> Index Scan using p_ci_pipelines_102_id_partition_id_key on p_ci_pipelines_102 (cost=0.14..2.16 rows=1 width=6) (actual time=0.023..0.023 rows=1 loops=1) Index Cond: ((id = 517) AND (partition_id = 102)) Planning Time: 0.161 ms Trigger for constraint p_ci_builds_commit_id_partition_id_fkey2 on p_ci_pipelines_102: time=0.238 calls=1 Execution Time: 0.302 ms (7 rows)
After fiddling with the config and log levels it seems that the trigger constraints propagate the partition_id value:
2022-07-14 16:51:30.782 EEST [67558] CONTEXT: SQL statement "DELETE FROM "public"."p_ci_builds" WHERE $1 OPERATOR(pg_catalog.=) "commit_id" AND $2 OPERATOR(pg_catalog.=) "partition_id"" 2022-07-14 16:51:30.782 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102;
Wall of text from the log
2022-07-14 16:51:30.779 EEST [67558] DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2022-07-14 16:51:30.779 EEST [67558] LOG: statement: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.780 EEST [67558] LOG: parse tree: 2022-07-14 16:51:30.780 EEST [67558] DETAIL: {QUERY :commandType 5 :querySource 0 :canSetTag true :utilityStmt ? :resultRelation 0 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :cteList <> :rtable <> :jointree <> :targetList <> :override 0 :onConflict <> :returningList <> :groupClause <> :groupingSets <> :havingQual <> :windowClause <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <> :stmt_location 0 :stmt_len 80 } 2022-07-14 16:51:30.780 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.780 EEST [67558] LOG: rewritten parse tree: 2022-07-14 16:51:30.780 EEST [67558] DETAIL: ( {QUERY :commandType 5 :querySource 0 :canSetTag true :utilityStmt ? :resultRelation 0 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :cteList <> :rtable <> :jointree <> :targetList <> :override 0 :onConflict <> :returningList <> :groupClause <> :groupingSets <> :havingQual <> :windowClause <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <> :stmt_location 0 :stmt_len 80 } ) 2022-07-14 16:51:30.780 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.780 EEST [67558] LOG: plan: 2022-07-14 16:51:30.780 EEST [67558] DETAIL: {PLANNEDSTMT :commandType 4 :queryId 0 :hasReturning false :hasModifyingCTE false :canSetTag true :transientPlan false :dependsOnRole false :parallelModeNeeded false :jitFlags 0 :planTree {MODIFYTABLE :startup_cost 0.14 :total_cost 2.16 :plan_rows 1 :plan_width 6 :parallel_aware false :parallel_safe false :plan_node_id 0 :targetlist <> :qual <> :lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b) :operation 4 :canSetTag true :nominalRelation 1 :rootRelation 1 :partColsUpdated false :resultRelations (i 2) :resultRelIndex 0 :rootResultRelIndex 0 :plans ( {INDEXSCAN :startup_cost 0.14 :total_cost 2.16 :plan_rows 1 :plan_width 6 :parallel_aware false :parallel_safe false :plan_node_id 1 :targetlist ( {TARGETENTRY :expr {VAR :varno 2 :varattno -1 :vartype 27 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno -1 :location -1 } :resno 1 :resname ctid :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk true } ) :qual <> :lefttree <> :righttree <> :initPlan <> :extParam (b 0) :allParam (b 0) :scanrelid 2 :indexid 11295617 :indexqual ( {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 65002 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 1 :location -1 } {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 54 :constvalue 4 [ 4 2 0 0 0 0 0 0 ] } ) :location 52 } {OPEXPR :opno 416 :opfuncid 474 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 65002 :varattno 2 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 30 :location -1 } {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 77 :constvalue 4 [ 102 0 0 0 0 0 0 0 ] } ) :location 75 } ) :indexqualorig ( {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 2 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 1 :location -1 } {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 54 :constvalue 4 [ 4 2 0 0 0 0 0 0 ] } ) :location 52 } {OPEXPR :opno 416 :opfuncid 474 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 2 :varattno 30 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 30 :location -1 } {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location 77 :constvalue 4 [ 102 0 0 0 0 0 0 0 ] } ) :location 75 } ) :indexorderby <> :indexorderbyorig <> :indexorderbyops <> :indexorderdir 1 } ) :withCheckOptionLists <> :returningLists <> :fdwPrivLists (<>) :fdwDirectModifyPlans (b) :rowMarks <> :epqParam 0 :onConflictAction 0 :arbiterIndexes <> :onConflictSet <> :onConflictWhere <> :exclRelRTI 0 :exclRelTlist <> } :rtable ( {RTE :alias <> :eref {ALIAS :aliasname p_ci_pipelines :colnames ("id" "ref" "sha" "before_sha" "created_at" "updated_at" "t ag" "yaml_errors" "committed_at" "project_id" "status" "started_at" " finished_at" "duration" "user_id" "lock_version" "auto_canceled_by_id " "pipeline_schedule_id" "source" "config_source" "protected" "failur e_reason" "iid" "merge_request_id" "source_sha" "target_sha" "externa l_pull_request_id" "ci_ref_id" "locked" "partition_id") } :rtekind 0 :relid 11294925 :relkind p :rellockmode 3 :tablesample <> :lateral false :inh true :inFromCl false :requiredPerms 10 :checkAsUser 0 :selectedCols (b 8 37) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals <> } {RTE :alias <> :eref {ALIAS :aliasname p_ci_pipelines :colnames ("id" "ref" "sha" "before_sha" "created_at" "updated_at" "t ag" "yaml_errors" "committed_at" "project_id" "status" "started_at" " finished_at" "duration" "user_id" "lock_version" "auto_canceled_by_id " "pipeline_schedule_id" "source" "config_source" "protected" "failur e_reason" "iid" "merge_request_id" "source_sha" "target_sha" "externa l_pull_request_id" "ci_ref_id" "locked" "partition_id") } :rtekind 0 :relid 11295609 :relkind r :rellockmode 3 :tablesample <> :lateral false :inh false :inFromCl false :requiredPerms 0 :checkAsUser 0 :selectedCols (b 8 37) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals <> } ) :resultRelations (i 2) :rootResultRelations (i 1) :subplans <> :rewindPlanIDs (b) :rowMarks <> :relationOids (o 11294925 11295609) :invalItems <> :paramExecTypes (o 0) :utilityStmt <> :stmt_location 0 :stmt_len 0 } 2022-07-14 16:51:30.780 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.781 EEST [67558] LOG: parse tree: 2022-07-14 16:51:30.781 EEST [67558] DETAIL: {QUERY :commandType 4 :querySource 0 :canSetTag true :utilityStmt <> :resultRelation 1 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :cteList <> :rtable ( {RTE :alias <> :eref {ALIAS :aliasname p_ci_builds :colnames ("status" "finished_at" "trace" "created_at" "updated_at" " started_at" "runner_id" "coverage" "commit_id" "name" "options" "allo w_failure" "stage" "trigger_request_id" "stage_idx" "tag" "ref" "user _id" "type" "target_url" "description" "project_id" "erased_by_id" "e rased_at" "artifacts_expire_at" "environment" "when" "yaml_variables" "queued_at" "token" "lock_version" "coverage_regex" "auto_canceled_b y_id" "retried" "protected" "failure_reason" "scheduled_at" "token_en crypted" "upstream_pipeline_id" "resource_group_id" "waiting_for_reso urce_at" "processed" "scheduling_type" "id" "stage_id" "partition_id" ) } :rtekind 0 :relid 11294979 :relkind p :rellockmode 3 :tablesample <> :lateral false :inh true :inFromCl false :requiredPerms 10 :checkAsUser 0 :selectedCols (b 16 53) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals <> } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals {BOOLEXPR :boolop and :args ( {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {PARAM :paramkind 0 :paramid 1 :paramtype 23 :paramtypmod -1 :paramcollid 0 :location 41 } {VAR :varno 1 :varattno 9 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 9 :location 67 } ) :location 44 } {OPEXPR :opno 410 :opfuncid 467 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {PARAM :paramkind 0 :paramid 2 :paramtype 20 :paramtypmod -1 :paramcollid 0 :location 83 } {VAR :varno 1 :varattno 46 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 46 :location 109 } ) :location 86 } ) :location 79 } } :targetList <> :override 0 :onConflict <> :returningList <> :groupClause <> :groupingSets <> :havingQual <> :windowClause <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <> :stmt_location 0 :stmt_len 0 } 2022-07-14 16:51:30.781 EEST [67558] CONTEXT: SQL statement "DELETE FROM "public"."p_ci_builds" WHERE $1 OPERATOR(pg_catalog.=) "commit_id" AND $2 OPERATOR(pg_catalog.=) "partition_id"" 2022-07-14 16:51:30.781 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.781 EEST [67558] LOG: rewritten parse tree: 2022-07-14 16:51:30.781 EEST [67558] DETAIL: ( {QUERY :commandType 4 :querySource 0 :canSetTag true :utilityStmt <> :resultRelation 1 :hasAggs false :hasWindowFuncs false :hasTargetSRFs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity false :cteList <> :rtable ( {RTE :alias <> :eref {ALIAS :aliasname p_ci_builds :colnames ("status" "finished_at" "trace" "created_at" "updated_at" " started_at" "runner_id" "coverage" "commit_id" "name" "options" "allo w_failure" "stage" "trigger_request_id" "stage_idx" "tag" "ref" "user _id" "type" "target_url" "description" "project_id" "erased_by_id" "e rased_at" "artifacts_expire_at" "environment" "when" "yaml_variables" "queued_at" "token" "lock_version" "coverage_regex" "auto_canceled_b y_id" "retried" "protected" "failure_reason" "scheduled_at" "token_en crypted" "upstream_pipeline_id" "resource_group_id" "waiting_for_reso urce_at" "processed" "scheduling_type" "id" "stage_id" "partition_id" ) } :rtekind 0 :relid 11294979 :relkind p :rellockmode 3 :tablesample <> :lateral false :inh true :inFromCl false :requiredPerms 10 :checkAsUser 0 :selectedCols (b 16 53) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals <> } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals {BOOLEXPR :boolop and :args ( {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {PARAM :paramkind 0 :paramid 1 :paramtype 23 :paramtypmod -1 :paramcollid 0 :location 41 } {VAR :varno 1 :varattno 9 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 9 :location 67 } ) :location 44 } {OPEXPR :opno 410 :opfuncid 467 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {PARAM :paramkind 0 :paramid 2 :paramtype 20 :paramtypmod -1 :paramcollid 0 :location 83 } {VAR :varno 1 :varattno 46 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 46 :location 109 } ) :location 86 } ) :location 79 } } :targetList <> :override 0 :onConflict <> :returningList <> :groupClause <> :groupingSets <> :havingQual <> :windowClause <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :rowMarks <> :setOperations <> :constraintDeps <> :withCheckOptions <> :stmt_location 0 :stmt_len 0 } ) 2022-07-14 16:51:30.781 EEST [67558] CONTEXT: SQL statement "DELETE FROM "public"."p_ci_builds" WHERE $1 OPERATOR(pg_catalog.=) "commit_id" AND $2 OPERATOR(pg_catalog.=) "partition_id"" 2022-07-14 16:51:30.781 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.782 EEST [67558] LOG: plan: 2022-07-14 16:51:30.782 EEST [67558] DETAIL: {PLANNEDSTMT :commandType 4 :queryId 0 :hasReturning false :hasModifyingCTE false :canSetTag true :transientPlan false :dependsOnRole false :parallelModeNeeded false :jitFlags 0 :planTree {MODIFYTABLE :startup_cost 0.14 :total_cost 4.05 :plan_rows 1 :plan_width 6 :parallel_aware false :parallel_safe false :plan_node_id 0 :targetlist <> :qual <> :lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b) :operation 4 :canSetTag true :nominalRelation 1 :rootRelation 1 :partColsUpdated false :resultRelations (i 2) :resultRelIndex 0 :rootResultRelIndex 0 :plans ( {INDEXSCAN :startup_cost 0.14 :total_cost 4.05 :plan_rows 1 :plan_width 6 :parallel_aware false :parallel_safe false :plan_node_id 1 :targetlist ( {TARGETENTRY :expr {VAR :varno 2 :varattno -1 :vartype 27 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno -1 :location -1 } :resno 1 :resname ctid :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk true } ) :qual ( {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 4 2 0 0 0 0 0 0 ] } {VAR :varno 2 :varattno 9 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 9 :location -1 } ) :location 44 } ) :lefttree <> :righttree <> :initPlan <> :extParam (b 0) :allParam (b 0) :scanrelid 2 :indexid 11295638 :indexqual ( {OPEXPR :opno 410 :opfuncid 467 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 65002 :varattno 2 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 46 :location -1 } {CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull false :location -1 :constvalue 8 [ 102 0 0 0 0 0 0 0 ] } ) :location 86 } ) :indexqualorig ( {OPEXPR :opno 410 :opfuncid 467 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ( {VAR :varno 2 :varattno 46 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 46 :location -1 } {CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull false :location -1 :constvalue 8 [ 102 0 0 0 0 0 0 0 ] } ) :location 86 } ) :indexorderby <> :indexorderbyorig <> :indexorderbyops <> :indexorderdir 1 } ) :withCheckOptionLists <> :returningLists <> :fdwPrivLists (<>) :fdwDirectModifyPlans (b) :rowMarks <> :epqParam 0 :onConflictAction 0 :arbiterIndexes <> :onConflictSet <> :onConflictWhere <> :exclRelRTI 0 :exclRelTlist <> } :rtable ( {RTE :alias <> :eref {ALIAS :aliasname p_ci_builds :colnames ("status" "finished_at" "trace" "created_at" "updated_at" " started_at" "runner_id" "coverage" "commit_id" "name" "options" "allo w_failure" "stage" "trigger_request_id" "stage_idx" "tag" "ref" "user _id" "type" "target_url" "description" "project_id" "erased_by_id" "e rased_at" "artifacts_expire_at" "environment" "when" "yaml_variables" "queued_at" "token" "lock_version" "coverage_regex" "auto_canceled_b y_id" "retried" "protected" "failure_reason" "scheduled_at" "token_en crypted" "upstream_pipeline_id" "resource_group_id" "waiting_for_reso urce_at" "processed" "scheduling_type" "id" "stage_id" "partition_id" ) } :rtekind 0 :relid 11294979 :relkind p :rellockmode 3 :tablesample <> :lateral false :inh true :inFromCl false :requiredPerms 10 :checkAsUser 0 :selectedCols (b 16 53) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals <> } {RTE :alias <> :eref {ALIAS :aliasname p_ci_builds :colnames ("status" "finished_at" "trace" "created_at" "updated_at" " started_at" "runner_id" "coverage" "commit_id" "name" "options" "allo w_failure" "stage" "trigger_request_id" "stage_idx" "tag" "ref" "user _id" "type" "target_url" "description" "project_id" "erased_by_id" "e rased_at" "artifacts_expire_at" "environment" "when" "yaml_variables" "queued_at" "token" "lock_version" "coverage_regex" "auto_canceled_b y_id" "retried" "protected" "failure_reason" "scheduled_at" "token_en crypted" "upstream_pipeline_id" "resource_group_id" "waiting_for_reso urce_at" "processed" "scheduling_type" "id" "stage_id" "partition_id" ) } :rtekind 0 :relid 11295631 :relkind r :rellockmode 3 :tablesample <> :lateral false :inh false :inFromCl false :requiredPerms 0 :checkAsUser 0 :selectedCols (b 16 53) :insertedCols (b) :updatedCols (b) :extraUpdatedCols (b) :securityQuals <> } ) :resultRelations (i 2) :rootResultRelations (i 1) :subplans <> :rewindPlanIDs (b) :rowMarks <> :relationOids (o 11294979 11295631) :invalItems <> :paramExecTypes (o 0) :utilityStmt <> :stmt_location 0 :stmt_len 0 } 2022-07-14 16:51:30.782 EEST [67558] CONTEXT: SQL statement "DELETE FROM "public"."p_ci_builds" WHERE $1 OPERATOR(pg_catalog.=) "commit_id" AND $2 OPERATOR(pg_catalog.=) "partition_id"" 2022-07-14 16:51:30.782 EEST [67558] STATEMENT: explain analyze delete from p_ci_pipelines where id = 516 and partition_id = 102; 2022-07-14 16:51:30.782 EEST [67558] DEBUG: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid: 8106593/1/2 2022-07-14 16:51:30.898 EEST [64971] DEBUG: snapshot of 0+0 running transaction ids (lsn 9/E7DE07E8 oldest xid 8106594 latest complete 8106593 next xid 8106594)
Database config used to capture the logs
log_directory = 'pg_log' log_filename = 'postgresql-dateformat.log' log_statement = 'all' logging_collector = on debug_print_parse = on debug_print_rewritten = on debug_print_plan = on log_min_messages = 'DEBUG5'
Deleting a pipeline by id from p_ci_pipelines deletes the corresponding builds from the right partition
But does it need to scan all partitions if
p_ci_builds
are also partitioned? Meaning that if we don't provide partitioning key PostgreSQL will still need to scan all partitions, right @mbobin? I think that it highlights the importance of having uniform partition_id assign to a pipeline and related resources.But does it need to scan all partitions if
p_ci_builds
are also partitioned?No, it doesn't scan all partitions because we need to include the partition key in the definition:
FOREIGN KEY (commit_id, partition_id) REFERENCES p_ci_pipelines(id, partition_id)
.with both tables partitioned, I've executed a query to delete a pipeline. Notice that in this case it iterates through all pipeline partitions and triggers the FK removal only on the partition that holds the data(
p_ci_pipelines_103
):gitlabhq_development_ci=# explain analyze delete from p_ci_pipelines where id = 564; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Delete on p_ci_pipelines (cost=0.15..12.98 rows=6 width=6) (actual time=0.045..0.046 rows=0 loops=1) Delete on ci_pipelines Delete on p_ci_pipelines_101 Delete on p_ci_pipelines_102 Delete on p_ci_pipelines_103 Delete on p_ci_pipelines_104 Delete on p_ci_pipelines_105 -> Index Scan using ci_pipelines_pkey on ci_pipelines (cost=0.15..2.17 rows=1 width=6) (actual time=0.011..0.011 rows=0 loops=1) Index Cond: (id = 564) -> Index Scan using p_ci_pipelines_101_id_partition_id_key on p_ci_pipelines_101 (cost=0.14..2.16 rows=1 width=6) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id = 564) -> Index Scan using p_ci_pipelines_102_id_partition_id_key on p_ci_pipelines_102 (cost=0.14..2.16 rows=1 width=6) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (id = 564) -> Index Scan using p_ci_pipelines_103_id_partition_id_key on p_ci_pipelines_103 (cost=0.14..2.16 rows=1 width=6) (actual time=0.006..0.006 rows=1 loops=1) Index Cond: (id = 564) -> Index Scan using p_ci_pipelines_104_id_partition_id_key on p_ci_pipelines_104 (cost=0.14..2.16 rows=1 width=6) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (id = 564) -> Index Scan using p_ci_pipelines_105_id_partition_id_key on p_ci_pipelines_105 (cost=0.14..2.16 rows=1 width=6) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (id = 564) Planning Time: 23.025 ms Trigger for constraint p_ci_builds_commit_id_partition_id_fkey3 on p_ci_pipelines_103: time=0.268 calls=1 Execution Time: 0.451 ms (22 rows)
And in the low level logs we can see that it uses both attributes(
commit_id, partition_id
) from the KF definition to delete the builds data, meaning that it will scan onlyp_ci_builds_103 (partition_id=103)
:2022-07-18 15:24:32.439 EEST [98692] CONTEXT: SQL statement "DELETE FROM "public"."p_ci_builds" WHERE $1 OPERATOR(pg_catalog.=) "commit_id" AND $2 OPERATOR(pg_catalog.=) "partition_id"" 2022-07-18 15:24:32.439 EEST [98692] STATEMENT: explain analyze delete from p_ci_pipelines where id = 564;
If we do provide a
partition_id
for pipelines deletion it will scan only that partition.Does this answer your question?
It answers my question @mbobin. I wonder how difficult it will actually be to redefine foreign keys, as we would need to also replace indexes, right? We know it might be difficult in the current setup.
@grzesiek I think we'll have to add
partition_id
to at least 23 tables and update their FK definitions.[45] pry(main)> ci_tables = ::Gitlab::Database::GitlabSchema.tables_to_schema.select { |k,v| v == :gitlab_ci }.keys.map { |t| Class.new(Ci::ApplicationRecord) { |c| c.table_name = t } } ; nil => nil [46] pry(main)> ci_tables.size => 54 [53] pry(main)> ci_tables.select { |t| (t.column_names & ['build_id', 'job_id', 'stage_id', 'pipeline_id', 'commit_id']).any? }.size => 23 [54] pry(main)> ci_tables.select { |t| (t.column_names & ['build_id', 'job_id', 'stage_id', 'pipeline_id', 'commit_id']).any? }.map(&:table_name) => ["ci_build_needs", "ci_build_pending_states", "ci_build_report_results", "ci_builds", "ci_builds_metadata", "ci_builds_runner_session", "ci_build_trace_chunks", "ci_build_trace_metadata", "ci_job_artifacts", "ci_job_variables", "ci_pending_builds", "ci_pipeline_artifacts", "ci_pipeline_chat_data", "ci_pipeline_messages", "ci_pipelines_config", "ci_pipeline_variables", "ci_resources", "ci_running_builds", "ci_sources_pipelines", "ci_sources_projects", "ci_stages", "ci_trigger_requests", "ci_unit_test_failures"]
@grzesiek my first thoughts here are that we'll need to redefine the PK and FK constraints to include the
partition_id
and other required indices. But after that the cascading FK deletion should just work within the specified partitions.To sum up:
- we need to add an
UNIQUE (id, partition_id)
constraint on all partitioned tables - we need to add
partition_id
to all tables that reference a partitioned table. This might complicate things later on when we decide to partition those tables because we'd need to move data around, but it should be similar to the plan for moving existing pipelines data topartition_id < 100
partitions - update foreign key definitions to use
(relation_id, partition_id)
and indexes where necessary
@grzesiek do we have enough information to close #358943 (closed)?
/cc @morefice
- we need to add an
do we have enough information to close #358943 (closed)?
@mbobin Yes! I believe that we can close this, but perhaps let's do that after we update the partitioning design document. I believe that we need to update this page https://docs.gitlab.com/ee/architecture/blueprints/ci_data_decay/pipeline_partitioning.html with what we've learned from the PoC!
Let's discuss the ideas about how to adjust our direction in the diff of that merge request, because we initially planned to add
partition_id
only to a few tables. /cc @moreficeMR opened at !92781 (merged)
Awesome work @mbobin
Thanks for the heads up, catching up everything that has been discussed in this MR
Should we create the associated issues related to our latest discoveries?
update foreign key definitions to use
(relation_id, partition_id)
and indexes where necessaryThis will be fun
Edited by Max Orefice
Allure report
allure-report-publisher
generated test report!review-qa-blocking:
test report for 57e429f3expand test summary
+-----------------------------------------------------------------------------------------+ | suites summary | +------------------------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +------------------------------------+--------+--------+---------+-------+-------+--------+ | Create | 0 | 23 | 2 | 23 | 25 | ❌ | | Plan | 0 | 47 | 1 | 47 | 48 | ❌ | | Verify | 0 | 12 | 1 | 12 | 13 | ❌ | | Manage | 3 | 37 | 0 | 40 | 40 | ❌ | | Version sanity check | 0 | 0 | 1 | 0 | 1 | ➖ | | Configure | 0 | 0 | 1 | 0 | 1 | ➖ | | Feature flag handler sanity checks | 9 | 0 | 0 | 9 | 9 | ❗ | | Package | 0 | 0 | 1 | 0 | 1 | ➖ | | Protect | 0 | 2 | 0 | 2 | 2 | ❌ | | Secure | 0 | 2 | 0 | 2 | 2 | ❌ | +------------------------------------+--------+--------+---------+-------+-------+--------+ | Total | 12 | 123 | 7 | 135 | 142 | ❌ | +------------------------------------+--------+--------+---------+-------+-------+--------+
removed [deprecated] Accepting merge requests label
mentioned in merge request !92535 (merged)
- A deleted user
added database databasereview pending labels
10 10 deployments: 2, 11 11 milestones: 3, 12 12 epics: 4, 13 ci_pipelines: 5, 13 p_ci_pipelines: 5, Agreed we definitely need a strategy for rollback in case things are broken
Edited by Max Orefice
mentioned in merge request !92781 (merged)
removed workflowready for development label
with !92781 (merged) merged we can close this one.
mentioned in merge request !96815 (merged)
added CI data partitioning label
mentioned in epic &7522
added Category:Continuous Integration label and removed 1 deleted label
mentioned in merge request mbobin/headway!1 (closed)
mentioned in merge request gitlab-org/ci-cd/ops-eng-managers/headway!1 (merged)