Backfill merge_requests_merge_data from merge_requests
What does this MR do and why?
In this MR, we write background migration to backfill any merge_requests_merge_data table from merge_requests table.
We should have dual-write already at application code deployed, https://gitlab.com/gitlab-org/gitlab/-/issues/560933+. Then this issue is to migrate any existing records untouched.
References
- Issue - https://gitlab.com/gitlab-org/gitlab/-/issues/560935+
- Issue - https://gitlab.com/gitlab-org/gitlab/-/issues/560933+
Screenshots or screen recordings
See below steps for verification
Query plan
INSERT INTO merge_requests_merge_data (
merge_request_id,
project_id,
merge_user_id,
merge_params,
merge_error,
merge_jid,
merge_commit_sha,
merged_commit_sha,
merge_ref_sha,
squash_commit_sha,
in_progress_merge_commit_sha,
merge_status,
auto_merge_enabled,
squash
)
SELECT
mr.id AS merge_request_id,
mr.target_project_id AS project_id,
mr.merge_user_id,
mr.merge_params,
mr.merge_error,
mr.merge_jid,
decode(mr.merge_commit_sha, 'hex') AS merge_commit_sha,
decode(encode(mr.merged_commit_sha, 'escape'), 'hex') AS merged_commit_sha,
decode(encode(mr.merge_ref_sha, 'escape'), 'hex') AS merge_ref_sha,
decode(encode(mr.squash_commit_sha, 'escape'), 'hex') AS squash_commit_sha,
decode(mr.in_progress_merge_commit_sha, 'hex') AS in_progress_merge_commit_sha,
CASE mr.merge_status
WHEN 'unchecked' THEN 0
WHEN 'preparing' THEN 1
WHEN 'checking' THEN 2
WHEN 'can_be_merged' THEN 3
WHEN 'cannot_be_merged' THEN 4
WHEN 'cannot_be_merged_recheck' THEN 5
WHEN 'cannot_be_merged_rechecking' THEN 6
ELSE 0
END AS merge_status,
mr.merge_when_pipeline_succeeds AS auto_merge_enabled,
mr.squash
FROM merge_requests AS mr
LEFT JOIN merge_requests_merge_data mmd
ON mmd.merge_request_id = mr.id
WHERE mr.id BETWEEN 1 AND 1000
AND mmd.merge_request_id IS NULL
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44668/commands/137084
How to set up and validate locally
- Go to local Rails console
rails c - Find first MR from the local
mr1 = MergeRequest.find(1) - We will test on mainly 6 columns below,
-
merged_commit_sha,byteaonmerge_requeststable, andbyteaonmerge_requests_merge_datatable (DB value different from otherbyteafields) -
merge_ref_sha,byteaonmerge_requeststable, andbyteaonmerge_requests_merge_datatable -
squash_commit_sha,byteaonmerge_requeststable, andbyteaonmerge_requests_merge_datatable -
merge_commit_sha,characteronmerge_requeststable, andbyteaonmerge_requests_merge_datatable -
in_progress_merge_commit_sha,characteronmerge_requeststable, andbyteaonmerge_requests_merge_datatable -
merge_status,characteronmerge_requeststable, andsmallintonmerge_requests_merge_datatable -
merge_params,texton sql before and after, andHashon model - Other columns are just normal columns, without column type change, and without special treatment for backfill script
-
- Update
mr1with below code
Click to expand
mr1.update!(
merged_commit_sha: '65e14876dd7203cbf4967ad4e0725185bbb77e5d',
merge_ref_sha: 'e871af1e1a371b0470eb974713a1294efaa37ed2',
merge_commit_sha: '531c9eedc5d51bfe14862020883ffa3538874eb6',
in_progress_merge_commit_sha: 'acff93c1a5cc4174b99a874861a90963cab4b56d',
merge_status: 'unchecked',
merge_params: {
"force_remove_source_branch" => "0",
"should_remove_source_branch" => false
}
)
- We could see we update
shaattributes to use actual value (grabbed from random prod data).merge_statusisuncheckedin this case, and also value formerge_params. - Connect to local psql
psql -h /Users/zhaochen.li/Projects/gitlab-development-kit/postgresql -d gitlabhq_development - Then inspect each attribute actual values on DB server (notice there are data transform between Rails and DB)
Click to expand
SELECT merged_commit_sha, merge_ref_sha, merge_commit_sha, in_progress_merge_commit_sha, squash_commit_sha, merge_status, merge_params FROM merge_requests WHERE id = 1;
merged_commit_sha | merge_ref_sha | merge_commit_sha | in_progress_merge_commit_sha | squash_commit_sha | merge_status | merge_params
------------------------------------------------------------------------------------+--------------------------------------------+------------------------------------------+------------------------------------------+--------------------------------------------+--------------+------------------------------------
\x36356531343837366464373230336362663439363761643465303732353138356262623737653564 | \xe871af1e1a371b0470eb974713a1294efaa37ed2 | 531c9eedc5d51bfe14862020883ffa3538874eb6 | acff93c1a5cc4174b99a874861a90963cab4b56d | \x359e01d217a43912746589cbbc015913b6b929d9 | unchecked | --- +
| | | | | | force_remove_source_branch: '0' +
| | | | | | should_remove_source_branch: false+
| | | | | |
(1 row)
- Make sure we have no previous
merge_requests_mrege_datacreated,MergeRequests::MergeData.delete_all - Perform sync for this single column only
Click to expand
Gitlab::BackgroundMigration::BackfillMergeRequestsMergeDataFromMergeRequests.new(
start_id: 1,
end_id: 1,
batch_table: :merge_requests,
batch_column: :id,
sub_batch_size: 2,
pause_ms: 0,
connection: ActiveRecord::Base.connection
).perform
- Check new
merge_datacreated,MergeRequests::MergeData.all
Click to expand
[#<MergeRequests::MergeData:0x0000000352635b50
merge_request_id: 1,
project_id: 1,
merge_user_id: nil,
merge_params: {"force_remove_source_branch"=>"0", "should_remove_source_branch"=>false},
merge_error: nil,
merge_jid: nil,
merge_commit_sha: "531c9eedc5d51bfe14862020883ffa3538874eb6",
merged_commit_sha: "65e14876dd7203cbf4967ad4e0725185bbb77e5d",
merge_ref_sha: "e871af1e1a371b0470eb974713a1294efaa37ed2",
squash_commit_sha: "359e01d217a43912746589cbbc015913b6b929d9",
in_progress_merge_commit_sha: "acff93c1a5cc4174b99a874861a90963cab4b56d",
merge_status: 0,
auto_merge_enabled: false,
squash: false>]
- Compare with existing merge_requests record for those columns
Click to expand
MergeRequest.find(1).attributes.slice(
'merge_commit_sha',
'merged_commit_sha',
'merge_ref_sha',
'squash_commit_sha',
'in_progress_merge_commit_sha',
'merge_params',
'merge_status'
)
{"merge_commit_sha"=>"531c9eedc5d51bfe14862020883ffa3538874eb6",
"merged_commit_sha"=>"65e14876dd7203cbf4967ad4e0725185bbb77e5d",
"merge_ref_sha"=>"e871af1e1a371b0470eb974713a1294efaa37ed2",
"squash_commit_sha"=>"359e01d217a43912746589cbbc015913b6b929d9",
"in_progress_merge_commit_sha"=>"acff93c1a5cc4174b99a874861a90963cab4b56d",
"merge_params"=>{"force_remove_source_branch"=>"0", "should_remove_source_branch"=>false},
"merge_status"=>"unchecked"}
- Now inspect each attribute actual values on DB server. We have all fields stored as standard
byteanow.merge_commit_shaandin_progress_merge_commit_shawere string, and now inbytea.merged_commit_shawasbyteabut stored using ASCII-encoded string asbytea, and now as proper binarybytea.merge_ref_shaandsquash_commit_shawere just copied directly, and they are always binarybytea.
Click to expand
SELECT * FROM merge_requests_merge_data WHERE merge_request_id = 1;
merge_request_id | project_id | merge_user_id | merge_params | merge_error | merge_jid | merge_commit_sha | merged_commit_sha | merge_ref_sha | squash_commit_sha | in_progress_merge_commit_sha | merge_status | auto_merge_enabled | squash
------------------+------------+---------------+------------------------------------+-------------+-----------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------+--------------------+--------
1 | 1 | | --- +| | | \x531c9eedc5d51bfe14862020883ffa3538874eb6 | \x65e14876dd7203cbf4967ad4e0725185bbb77e5d | \xe871af1e1a371b0470eb974713a1294efaa37ed2 | \x359e01d217a43912746589cbbc015913b6b929d9 | \xacff93c1a5cc4174b99a874861a90963cab4b56d | 0 | f | f
| | | force_remove_source_branch: '0' +| | | | | | | | | |
| | | should_remove_source_branch: false+| | | | | | | | | |
| | | | | | | | | | | | |
(1 row)
- Now find second MR record,
mr2 = MergeRequest.find(2) - Update second MR record,
mr2.update!(merged_commit_sha: nil, merge_ref_sha: nil, merge_commit_sha: nil, in_progress_merge_commit_sha: nil, merge_status: 'cannot_be_merged', merge_params: nil) - Update first MR record again (to verify we would not touch exisiting
merge_datarecord),mr1.update!(merged_commit_sha: nil, merge_params: nil) - Perform sync for 2 columns only
Click to expand
Gitlab::BackgroundMigration::BackfillMergeRequestsMergeDataFromMergeRequests.new(
start_id: 1,
end_id: 2,
batch_table: :merge_requests,
batch_column: :id,
sub_batch_size: 2,
pause_ms: 0,
connection: ActiveRecord::Base.connection
).perform
- Check both records,
MergeRequests::MergeData.all - verify everything looks fine for second record, and verify first record does not change
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Edited by zli