Skip to content

Org Mover: Copy PG data during org maintenance window

Problem

We need to copy the organization's PG data during the maintenance window.

(In the future, we will use logical replication, which will only need to catch up with any remaining data changes.)

Proposal

At a high-level, we need to:

  • Generate all of the SQL copy commands for all of the tables
  • Run the SQL copy commands against the legacy cell's replica DB(s)
  • Generate all of the insert commands for all of that data
  • Run the insert commands against the target cell's DB(s)

More details

https://gitlab.com/gitlab-org/gitlab/-/issues/458759#note_1971410175

In the future when we work on COPY approach implementation – worth including to the future blueprint:

  • due to FK cycles, it is worth using the trick SET session_replication_role = 'replica';, but with additional check performed before the actual COPY action
  • PG16 used in destination gave us 16h for the GitLab namespaces, but we believe we'll be able to reach <10h timing using several optimizations (not only parallelization for a single table)
  • If we use session_replication_role = replica, it's worth having pre-step that tests data to ensure that we don't have dependant records that COPY doesn't take -- the prepare stage
  • Additionally, we could have a dry-run stage: run COPY to some destination cluster that is not in use, to see how it works, to ensure that there are no errors and actually measure the timing. So the workflow could be: prepare -> dry-run (optional) -> run.

Additionally, we could have a dry-run stage: run COPY to some destination cluster that is not in use, to see how it works, to ensure that there are no errors and actually measure the timing. So the workflow could be: prepare -> dry-run (optional) -> run.

Steps:

  1. Create a temporary target database for the 'dry-run' step.
  2. Duplicate the schema (without data) to the target database.
  3. Perform the COPY with restore to the target database. Example:
  4. Drop a temporary target database.
psql ... -c "create database org_mover_tmp_jhw4if7 owner gitlab"
pg_dump ... -d gitlabhq_production --schema-only | psql ... -d org_mover_tmp_jhw4if7
psql ... -d gitlabhq_production -c "COPY ( SELECT * FROM <tabe_name> WHERE ... IN (...) ) TO STDOUT ..." | psql ... -d org_mover_tmp_jhw4if7 -c "COPY <tabe_name> FROM STDIN ..."
"COPY ... "
"COPY ... "
psql ... -c "drop database org_mover_tmp_jhw4if7 with (force)"

due to FK cycles, it is worth using the trick SET session_replication_role = 'replica';, but with additional check performed before the actual COPY action

I added here the output of cycles in debug mode, an example:

[WARNING] Cyclic dependencies detected: [["ci_pipelines", "ci_pipelines"], ["merge_requests", "merge_request_diffs", "merge_requests"], ["vulnerabilities", "vulnerability_occurrences", "vulnerabilities"], ["integrations", "integrations"]]

If we use session_replication_role = replica, it's worth having pre-step that tests data to ensure that we don't have dependant records that COPY doesn't take -- the prepare stage

I added here a check for the presence of all dependent tables and if some tables are missing, the script will fail and will not generate COPY commands

Example:

./generate-selects-for-organization --root-namespace-ids *** --psql   "psql -h localhost -p 6015 -U vitaliy -d gitlabhq_dblab -c"
[ERROR] Missing tables to cover all dependencies: ["jira_connect_installations", "namespaces", "epics", "sprints", "users", "milestones", "projects", "cluster_agents", "vulnerabilities", "vulnerability_scanners", "bulk_imports", "audit_events_instance_external_streaming_destinations", "zoekt_nodes", "zoekt_enabled_namespaces", "audit_events_instance_external_audit_event_destinations", "emails", "gitlab_partitions_dynamic.ci_builds_101", "p_ci_builds", "ci_builds", "gitlab_partitions_dynamic.ci_builds_102", "plans", "audit_events_group_external_streaming_destinations", "reviews", "geo_nodes", "ci_runners", "search_indices"]
Edited by Michael Kozono