Skip to content

'pg-upgrade' only upgrades one of two PG clusters on a node with Geo tracking DB

Summary

This bug applies to GitLab instances with the following characteristics:

  • 3K reference architecture or higher
  • Geo is configured, and a Geo secondary instance is running
  • On the Geo secondary instance, there are one or more nodes with:
    • A main database (for example, a Praefect DB) and a Geo tracking database
    • The two database are running on separate Postgres instances

When the Postgres servers need to be upgraded on these Geo secondary nodes, there are a couple scenarios where running gitlab-ctl pg-upgrade only upgrades the Geo tracking DB, and not the main DB.

Here's the logic in the pg-upgrade code that detects what type of node the command is running on, and which upgrade methods to run:

  # The possible deployment types we need to handle are:
  # 1. Standalone regular postgresql node
  # 2. Standalone Geo primary node
  # 3. Standalone Geo secondary node
  # 4. Leader in Regular/Geo-Primary patroni cluster
  # 5. Replica in Regular/Geo-Primary patroni cluster
  # 6. Leader in Geo-Secondary patroni cluster (i.e, standby leader)
  # 7. Replica in Geo-Secondary patroni cluster

  if patroni_enabled
    if @instance_type == :patroni_leader
      patroni_leader_upgrade
    elsif @instance_type == :patroni_replica
      patroni_replica_upgrade
    elsif @instance_type == :patroni_standby_leader
      patroni_standby_leader_upgrade
    end
  elsif @roles.include?('geo_primary')
    log 'Detected a GEO primary node'
    @instance_type = :geo_primary
    general_upgrade
  elsif @roles.include?('geo_secondary')
    log 'Detected a Geo secondary node'
    @instance_type = :geo_secondary
    geo_secondary_upgrade(options[:tmp_dir], options[:timeout])
  elsif service_enabled?('geo-postgresql')
    log 'Detected a Geo PostgreSQL node'
    @instance_type = :geo_postgresql
    geo_pg_upgrade
  else
    general_upgrade
  end
end

When the geo-secondary role is configured (elsif @roles.include?('geo_secondary') or the geo-postgresql service is enabled (elsif service_enabled?('geo-postgresql')), only the geo_pg_upgrade method is triggered, and the general_upgrade method is not. When this happens, you end up with an upgraded Geo tracking DB, but the main DB is still on the old Postgres version.

This may not technically be a "bug", but it is unexpected behavior. It's easy for customers to run pg-upgrade, see it succeed, and incorrectly assume that both DB instances on the node have been upgraded. If they try to then upgrade their GitLab instance to a version that requires a higher Postgres version (for example, GitLab v17 which requires PG v14), it will fail due to the main Postgres instance on the affected node being on the old, unsupported version (v13).

This issue is most likely to occur on large HA deployments, where upgrades are already complex and prone to complications.

Related to:

Affected a large GitLab Ultimate customer. See https://gitlab.com/gitlab-org/distribution/team-tasks/-/issues/1617#note_2113966898

Steps to reproduce

  1. Deploy the 3K Reference Architecture with GET
  2. Between GitLab versions 16.8.5 - 16.11.x
  3. Must be configured as a Geo secondary node
  4. Ensure the instance is using PostgreSQL v13.x
  5. On the Praefect-Postgres node, use the following configuration: gitlab.rb gitlab.geo.tracking.rb
  • Note geo_postgresql['enable'] = true, and that the only role configured is postgres_role.
  1. Run gitlab-ctl reconfigure on the Praefect-Postgres node
  2. Confirm that Geo tracking DB is up by running gitlab-ctl status. You should see the following services running:
# gitlab-ctl status
run: consul: (pid 1148) 4572s; run: log: (pid 1147) 4572s
run: geo-postgresql: (pid XXXX) XXXXs; run: log: (pid 6370) 1392s
run: logrotate: (pid 6567) 972s; run: log: (pid 1130) 4572s
run: node-exporter: (pid 1144) 4572s; run: log: (pid 1133) 4572s
run: postgres-exporter: (pid XXXX) XXXXs; run: log: (pid 1146) 4572s
run: postgresql: (pid XXXX) XXXXs; run: log: (pid 1129) 4572s
  1. Shut down geo-postgresql, postgresql, and postgres-exporter:
sudo gitlab-ctl stop geo-postgresql
sudo gitlab-ctl stop postgresql
sudo gitlab-ctl stop postgres-exporter
  1. Run gitlab-ctl pg-upgrade -V 14
  2. Verify that the Geo tracking database was upgraded to v14, but not the Praefect DB:
    • ls -l /opt/gitlab/embedded/postgresql/ shows that both v13 and v14 binaries are present
    • ls -ls /opt/gitlab/embedded/bin - Postgres-related symlinks all point to /opt/gitlab/embedded/postgresql/13/bin
    • cat /var/opt/gitlab/postgresql/data/PG_VERSION - output is 13
    • cat /var/opt/gitlab/geo-postgresql/data/PG_VERSION - output is 14

What is the current bug behavior?

In the scenarios described above, running gitlab-ctl pg-upgrade only upgrades the Geo tracking Postgres instance, but not the main Postgres instance on the node. It does not warn you that there is another Postgres instance on the node that has not been upgraded.

What is the expected correct behavior?

  1. When you run gitlab-ctl pg-upgrade, it checks for and alerts you if there are multiple GitLab-bundled database instances on the node.
  2. As @pursultani suggests in this comment, it would be great to have a command-line argument for pg-upgrade to specify which GitLab-bundled Postgres instance you want to upgrade.

Alternative

Add ability to detect and upgrade all GitLab-bundled database nodes with a single execution of the pg-upgrade command. When the geo-postgresql service is enabled, for example, the logic could look something like this:

elsif service_enabled?('geo-postgresql')
    log 'Detected a Geo PostgreSQL node'
    @instance_type = :geo_postgresql
    geo_pg_upgrade
    if service_enabled?('postgresql')
      general_upgrade. <---- Also upgrade standard database?
    end

Hossein strongly recommended against this, as the command's logic is over-complicated as it is, and giving it the ability to upgrade multiple DB clusters at a time would add to this problem.

Relevant logs

Relevant logs

Details of package version

Provide the package version installation details

gitlab.rb

roles ['postgres_role']
patroni['enable'] = false
postgresql['listen_address'] = '0.0.0.0'
postgresql['sql_user_password'] = "REDACTED PASSWORD"
postgresql['trust_auth_cidr_addresses'] = ['0.0.0.0/0']
postgresql['shared_preload_libraries'] = 'pg_stat_statements'
gitlab_rails['auto_migrate'] = false
consul['enable'] = true
consul['configuration'] = {
  bind_addr: '<ip-address>',
  retry_join: %w(<ip-address> <ip-address> <ip-address>)
}
consul['monitoring_service_discovery'] = true
postgres_exporter['listen_address'] = '0.0.0.0:9187'
node_exporter['listen_address'] = '0.0.0.0:9100'
custom_confs = Dir.glob(File.join("/etc/gitlab/", "gitlab.{praefect_postgres}.*.rb"))
custom_confs.each { |conf|
  from_file conf
}
geo_confs = Dir.glob(File.join("/etc/gitlab/", "gitlab.geo.*.rb"))
geo_confs.each { |conf|
  from_file conf
}

gitlab.geo.tracking.rb

geo_postgresql['enable'] = true
geo_postgresql['listen_address'] = '0.0.0.0'
geo_postgresql['sql_user_password'] = "REDACTED PASSWORD"
geo_postgresql['md5_auth_cidr_addresses'] = ['127.0.0.1/32', '10.0.0.0/9']
gitlab_rails['db_host'] = '0.0.0.0'
gitlab_rails['db_password'] = "REDACTED PASSWORD"
gitlab_rails['auto_migrate'] = false
Edited by John Gaughan