Skip to content

GitLab Next

Why GitLab
Pricing
Contact Sales
Explore

Sign in
Get free trial

Migration to GCP. Database-related improvements (the roadmap)

Postgres-related TODO items for GCP migration:

Enable streaming replication from Azure master to GCP replica. WAL shipping (based on WAL-E) remains as backup. migration#715 (closed) (old issue related to this: migration#170 (closed))
Benchmark Postgres on Azure and GCP instances: we want to ascertain that the GCP instance can handle the load.
- Prep:
- Benchmark https://gitlab.com/gitlab-com/infrastructure/issues/4632
  - pgbench benchmarks
  - disk-level benchmarks
  - "real workload" benchmarks
Benchmark network bandwidth and latency between Azure and GCP: this will aid in predicting timings during the migration.
Setup a “tombstone” table, to help signal the end of the data replication migration#716 (closed)
After downtime starts, update pg_hba.conf and reload the configuration to prevent writes in Azure. migration#717 (closed)
Develop a switch-back mechanism:
1. based on logical decoding
2. additionally, based on physical replication
Do not use repmgr for switchover. Switchover to a designated Postgres GCP host using basic/native way to do it (pg_ctl .. promote or the trigger file) migration#718 (closed)
Review pgbouncer configuration on GCP
Study potential data-loss scenario with pgbouncer and Consul DNS. Implement either STONIH and/or adjust pgbouncer DNS TTL configuration.
Review all database-related steps of the migration migration#724 (closed)
1. script all pre- and post-checks
2. add more health checks for every step
3. add time estimates for all steps
~~Turn SSL on for pgbouncers (double-check performance)~~
Do at least 5 DB-only or full migration trials using final version of the procedure, without code changes ("code freeze")
Analyze the current workload on Azure, the current and short-term future needs for GCP DB nodes, compare with the benchmarked capacity and GCP specs https://gitlab.com/gitlab-com/infrastructure/issues/4650 (related: https://gitlab.com/gitlab-com/infrastructure/issues/4632)
Reduce disk reads and writes on the master. Currently (observed visually, with iotop, during ~10min periods): 100-200 MB/s reads, 50-100 MB/s writes, spikes up to 1GB/s reads, up to 200 MB/s writes.
1. Aggressive autovacuum is the main contributor here, has to be tuned to be less aggressive. The good goal would be to have <<100MB/s for reads and <<50 MB/s for writes. https://gitlab.com/gitlab-com/infrastructure/issues/4653
2. gpg is among the main write-I/O contributors. Consider moving WAL encryption away from the master https://gitlab.com/gitlab-com/infrastructure/issues/4654
default_statistics_target = 1000 has to be applied to all nodes before the migration #105 (closed)
Double-check that all scripts stopping Postgres, will not fail due to active sessions.

Edited Aug 08, 2018 by Nikolay Samokhvalov

Assignee

Time tracking