May need to run reconfigure during PG upgrade procedure with Geo

I've been needing a little something extra during this procedure but I am not 100% sure what and when since it takes forever to setup and test.

I've narrowed it down to this:

sudo gitlab-rake geo:db:refresh_foreign_tables fails.
I need at least a sudo gitlab-ctl reconfigure on the read-replica DB node

I am pretty sure the reconfigure is needed because this last time I tried refresh_foreign_tables after each restart and reconfigure, and it succeeded after a reconfigure on the read-replica DB node.

That reconfigure did a bunch of things:

* file[/var/opt/gitlab/postgresql/data/server.crt] action create
    - update content in file /var/opt/gitlab/postgresql/data/server.crt from 55fa10 to 2fd1af
    - suppressed sensitive resource
  * file[/var/opt/gitlab/postgresql/data/server.key] action create
    - update content in file /var/opt/gitlab/postgresql/data/server.key from 3a88d5 to 84764a
    - suppressed sensitive resource
  * postgresql_config[gitlab] action create
    * template[/var/opt/gitlab/postgresql/data/postgresql.conf] action create (up to date)
    * template[/var/opt/gitlab/postgresql/data/runtime.conf] action create
      - update content in file /var/opt/gitlab/postgresql/data/runtime.conf from 4d033a to 207bba
      --- /var/opt/gitlab/postgresql/data/runtime.conf    2020-04-21 04:40:34.635066486 +0000
      +++ /var/opt/gitlab/postgresql/data/.chef-runtime20200421-30888-1n4n4ct.conf    2020-04-21 04:47:09.492402070 +0000
      @@ -27,12 +27,12 @@
               # number of seconds; 0 disables
       
       # - Replication
      -wal_keep_segments = 50
      +wal_keep_segments = 10
       
      -max_standby_archive_delay = 30s # max delay before canceling queries
      +max_standby_archive_delay = 60s # max delay before canceling queries
                 # when reading WAL from archive;
                 # -1 allows indefinite delay
      -max_standby_streaming_delay = 30s # max delay before canceling queries
      +max_standby_streaming_delay = 60s # max delay before canceling queries
                 # when reading streaming WAL;
                 # -1 allows indefinite delay
       
    * template[/var/opt/gitlab/postgresql/data/pg_hba.conf] action create
      - update content in file /var/opt/gitlab/postgresql/data/pg_hba.conf from 2e0453 to 496bb7
      --- /var/opt/gitlab/postgresql/data/pg_hba.conf    2020-04-21 04:40:34.379047345 +0000
      +++ /var/opt/gitlab/postgresql/data/.chef-pg_hba20200421-30888-dqwevx.conf    2020-04-21 04:47:09.500402668 +0000
      @@ -71,10 +71,10 @@
       local   all         all                               peer map=gitlab
       
       
      -host    all         all         10.138.0.30/32           md5
      -host    replication gitlab_replicator 10.138.0.30/32     md5
      -host    all         all         10.138.0.110/32           md5
      -host    replication gitlab_replicator 10.138.0.110/32     md5
      +host    all         all         10.138.0.31/32           md5
      +host    replication gitlab_replicator 10.138.0.31/32     md5
      +host    all         all         10.138.0.86/32           md5
      +host    replication gitlab_replicator 10.138.0.86/32     md5
       host    all         all         localhost           md5
       host    replication gitlab_replicator localhost     md5
       
    * template[/var/opt/gitlab/postgresql/data/pg_ident.conf] action create (up to date)
 
  * execute[reload postgresql] action run
    - execute /opt/gitlab/bin/gitlab-ctl hup postgresql

I have no idea why pg_hba.conf changed so drastically-- I didn't touch gitlab.rb since before the test setup. Also, pg-upgrade runs reconfigure before this.

10.138.0.30 was primary app node
10.138.0.110 was secondary read-replica DB node
10.138.0.31 was secondary app node
10.138.0.86 was secondary tracking DB node

This was the relevant line in the read-replica's /etc/gitlab/gitlab.rb, which didn't change before this reconfigure:

postgresql['md5_auth_cidr_addresses'] = ['10.138.0.31/32', '10.138.0.86/32', 'localhost']

So it makes no sense to me that the read-replica's pg_hba.conf previously authorized the primary app node and it's own internal IP.

Edited Apr 21, 2020 by Michael Kozono