Skip to content

Backfill merge_requests_merge_data from merge_requests

What does this MR do and why?

In this MR, we write background migration to backfill any merge_requests_merge_data table from merge_requests table.

We should have dual-write already at application code deployed, https://gitlab.com/gitlab-org/gitlab/-/issues/560933+. Then this issue is to migrate any existing records untouched.

References

Screenshots or screen recordings

See below steps for verification

Query plan

INSERT INTO merge_requests_merge_data (
  merge_request_id,
  project_id,
  merge_user_id,
  merge_params,
  merge_error,
  merge_jid,
  merge_commit_sha,
  merged_commit_sha,
  merge_ref_sha,
  squash_commit_sha,
  in_progress_merge_commit_sha,
  merge_status,
  auto_merge_enabled,
  squash
)
SELECT
  mr.id AS merge_request_id,
  mr.target_project_id AS project_id,
  mr.merge_user_id,
  mr.merge_params,
  mr.merge_error,
  mr.merge_jid,
  decode(mr.merge_commit_sha, 'hex') AS merge_commit_sha,
  decode(encode(mr.merged_commit_sha, 'escape'), 'hex') AS merged_commit_sha,
  decode(encode(mr.merge_ref_sha, 'escape'), 'hex') AS merge_ref_sha,
  decode(encode(mr.squash_commit_sha, 'escape'), 'hex') AS squash_commit_sha,
  decode(mr.in_progress_merge_commit_sha, 'hex') AS in_progress_merge_commit_sha,
  CASE mr.merge_status
    WHEN 'unchecked' THEN 0
    WHEN 'preparing' THEN 1
    WHEN 'checking' THEN 2
    WHEN 'can_be_merged' THEN 3
    WHEN 'cannot_be_merged' THEN 4
    WHEN 'cannot_be_merged_recheck' THEN 5
    WHEN 'cannot_be_merged_rechecking' THEN 6
    ELSE 0
  END AS merge_status,
  mr.merge_when_pipeline_succeeds AS auto_merge_enabled,
  mr.squash
FROM merge_requests AS mr
LEFT JOIN merge_requests_merge_data mmd 
  ON mmd.merge_request_id = mr.id
WHERE mr.id BETWEEN 1 AND 1000
  AND mmd.merge_request_id IS NULL

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/44668/commands/137084

How to set up and validate locally

  • Go to local Rails console rails c
  • Find first MR from the local mr1 = MergeRequest.find(1)
  • We will test on mainly 6 columns below,
    • merged_commit_sha, bytea on merge_requests table, and bytea on merge_requests_merge_data table (DB value different from other bytea fields)
    • merge_ref_sha, bytea on merge_requests table, and bytea on merge_requests_merge_data table
    • squash_commit_sha, bytea on merge_requests table, and bytea on merge_requests_merge_data table
    • merge_commit_sha, character on merge_requests table, and bytea on merge_requests_merge_data table
    • in_progress_merge_commit_sha, character on merge_requests table, and bytea on merge_requests_merge_data table
    • merge_status, character on merge_requests table, and smallint on merge_requests_merge_data table
    • merge_params, text on sql before and after, and Hash on model
    • Other columns are just normal columns, without column type change, and without special treatment for backfill script
  • Update mr1 with below code
Click to expand
mr1.update!(
  merged_commit_sha: '65e14876dd7203cbf4967ad4e0725185bbb77e5d',
  merge_ref_sha: 'e871af1e1a371b0470eb974713a1294efaa37ed2',
  merge_commit_sha: '531c9eedc5d51bfe14862020883ffa3538874eb6',
  in_progress_merge_commit_sha: 'acff93c1a5cc4174b99a874861a90963cab4b56d',
  merge_status: 'unchecked',
  merge_params: {
    "force_remove_source_branch" => "0",
    "should_remove_source_branch" => false
  }
)
  • We could see we update sha attributes to use actual value (grabbed from random prod data). merge_status is unchecked in this case, and also value for merge_params.
  • Connect to local psql psql -h /Users/zhaochen.li/Projects/gitlab-development-kit/postgresql -d gitlabhq_development
  • Then inspect each attribute actual values on DB server (notice there are data transform between Rails and DB)
Click to expand
SELECT merged_commit_sha, merge_ref_sha, merge_commit_sha, in_progress_merge_commit_sha, squash_commit_sha, merge_status, merge_params FROM merge_requests WHERE id = 1;

                                 merged_commit_sha                                  |               merge_ref_sha                |             merge_commit_sha             |       in_progress_merge_commit_sha       |             squash_commit_sha              | merge_status |            merge_params            
------------------------------------------------------------------------------------+--------------------------------------------+------------------------------------------+------------------------------------------+--------------------------------------------+--------------+------------------------------------
 \x36356531343837366464373230336362663439363761643465303732353138356262623737653564 | \xe871af1e1a371b0470eb974713a1294efaa37ed2 | 531c9eedc5d51bfe14862020883ffa3538874eb6 | acff93c1a5cc4174b99a874861a90963cab4b56d | \x359e01d217a43912746589cbbc015913b6b929d9 | unchecked    | ---                               +
                                                                                    |                                            |                                          |                                          |                                            |              | force_remove_source_branch: '0'   +
                                                                                    |                                            |                                          |                                          |                                            |              | should_remove_source_branch: false+
                                                                                    |                                            |                                          |                                          |                                            |              | 
(1 row)
  • Make sure we have no previous merge_requests_mrege_data created, MergeRequests::MergeData.delete_all
  • Perform sync for this single column only
Click to expand
Gitlab::BackgroundMigration::BackfillMergeRequestsMergeDataFromMergeRequests.new(
  start_id: 1,
  end_id: 1,
  batch_table: :merge_requests,
  batch_column: :id,
  sub_batch_size: 2,
  pause_ms: 0,
  connection: ActiveRecord::Base.connection
).perform
  • Check new merge_data created, MergeRequests::MergeData.all
Click to expand
[#<MergeRequests::MergeData:0x0000000352635b50
  merge_request_id: 1,
  project_id: 1,
  merge_user_id: nil,
  merge_params: {"force_remove_source_branch"=>"0", "should_remove_source_branch"=>false},
  merge_error: nil,
  merge_jid: nil,
  merge_commit_sha: "531c9eedc5d51bfe14862020883ffa3538874eb6",
  merged_commit_sha: "65e14876dd7203cbf4967ad4e0725185bbb77e5d",
  merge_ref_sha: "e871af1e1a371b0470eb974713a1294efaa37ed2",
  squash_commit_sha: "359e01d217a43912746589cbbc015913b6b929d9",
  in_progress_merge_commit_sha: "acff93c1a5cc4174b99a874861a90963cab4b56d",
  merge_status: 0,
  auto_merge_enabled: false,
  squash: false>]
  • Compare with existing merge_requests record for those columns
Click to expand
MergeRequest.find(1).attributes.slice(
  'merge_commit_sha',
  'merged_commit_sha',
  'merge_ref_sha',
  'squash_commit_sha',
  'in_progress_merge_commit_sha',
  'merge_params',
  'merge_status'
)

{"merge_commit_sha"=>"531c9eedc5d51bfe14862020883ffa3538874eb6",
 "merged_commit_sha"=>"65e14876dd7203cbf4967ad4e0725185bbb77e5d",
 "merge_ref_sha"=>"e871af1e1a371b0470eb974713a1294efaa37ed2",
 "squash_commit_sha"=>"359e01d217a43912746589cbbc015913b6b929d9",
 "in_progress_merge_commit_sha"=>"acff93c1a5cc4174b99a874861a90963cab4b56d",
 "merge_params"=>{"force_remove_source_branch"=>"0", "should_remove_source_branch"=>false},
 "merge_status"=>"unchecked"}
  • Now inspect each attribute actual values on DB server. We have all fields stored as standard bytea now. merge_commit_sha and in_progress_merge_commit_sha were string, and now in bytea. merged_commit_sha was bytea but stored using ASCII-encoded string as bytea, and now as proper binary bytea. merge_ref_sha and squash_commit_sha were just copied directly, and they are always binary bytea.
Click to expand
SELECT * FROM merge_requests_merge_data WHERE merge_request_id = 1;

merge_request_id | project_id | merge_user_id |            merge_params            | merge_error | merge_jid |              merge_commit_sha              |             merged_commit_sha              |               merge_ref_sha                |             squash_commit_sha              |        in_progress_merge_commit_sha        | merge_status | auto_merge_enabled | squash 
------------------+------------+---------------+------------------------------------+-------------+-----------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------+--------------------+--------
                1 |          1 |               | ---                               +|             |           | \x531c9eedc5d51bfe14862020883ffa3538874eb6 | \x65e14876dd7203cbf4967ad4e0725185bbb77e5d | \xe871af1e1a371b0470eb974713a1294efaa37ed2 | \x359e01d217a43912746589cbbc015913b6b929d9 | \xacff93c1a5cc4174b99a874861a90963cab4b56d |            0 | f                  | f
                  |            |               | force_remove_source_branch: '0'   +|             |           |                                            |                                            |                                            |                                            |                                            |              |                    | 
                  |            |               | should_remove_source_branch: false+|             |           |                                            |                                            |                                            |                                            |                                            |              |                    | 
                  |            |               |                                    |             |           |                                            |                                            |                                            |                                            |                                            |              |                    | 
(1 row)
  • Now find second MR record, mr2 = MergeRequest.find(2)
  • Update second MR record, mr2.update!(merged_commit_sha: nil, merge_ref_sha: nil, merge_commit_sha: nil, in_progress_merge_commit_sha: nil, merge_status: 'cannot_be_merged', merge_params: nil)
  • Update first MR record again (to verify we would not touch exisiting merge_data record), mr1.update!(merged_commit_sha: nil, merge_params: nil)
  • Perform sync for 2 columns only
Click to expand
Gitlab::BackgroundMigration::BackfillMergeRequestsMergeDataFromMergeRequests.new(
  start_id: 1,
  end_id: 2,
  batch_table: :merge_requests,
  batch_column: :id,
  sub_batch_size: 2,
  pause_ms: 0,
  connection: ActiveRecord::Base.connection
).perform
  • Check both records, MergeRequests::MergeData.all
  • verify everything looks fine for second record, and verify first record does not change

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by zli

Merge request reports

Loading