Skip to content

Add timeout options for configure_postgresql of pg-upgrade

What does this MR do?

Problem

Related JH MR https://jihulab.com/gitlab-cn/gitlab/-/issues/686

When I upgrade the node for patroni replica,It failed. after investigation, it is found that when the upgrade command pg-upgrade is executed, the replica database node will use pg_basebackup command to pull the basic backup again from the leader node. Since Gitlab did not detect the available status of the running database within the specified time, it was interrupted due to timeout. The pg_basebackup command was immediately interrupted,so I modified the settings postgresql['max_service_checks'] = 20 and postgresql['service_check_interval'] = 60, the settings not in gitlab.rb default, now command timeout from 3 minutes to 10 minutes, but 10 minutes is not enough.I do not know which parameter in gitlab.rb can change , from source code I found 600s is the limit of running command, 10 minutes is not enough for a database of 100GB+

Command output error logs:

    ================================================================================
    Error executing action `run` on resource 'ruby_block[wait for postgresql to start]'
    ================================================================================

    RuntimeError
    ------------
    PostgreSQL did not respond before service checks were exhausted

    Cookbook Trace:
    ---------------
    /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/libraries/helpers/pg_status_helper.rb:56:in `ready?'
    /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/libraries/helpers/base_pg_helper.rb:28:in `is_ready?'
    /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb:93:in `block (2 levels) in from_file'

    Resource Declaration:
    ---------------------
    # In /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb

     92: ruby_block 'wait for postgresql to start' do
     93:   block { pg_helper.is_ready? }
     94:   only_if { omnibus_helper.should_notify?(patroni_helper.service_name) }
     95: end
     96:

    Compiled Resource:
    ------------------
    # Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb:92:in `from_file'

    ruby_block("wait for postgresql to start") do
      action [:run]
      default_guard_interpreter :default
      declared_type :ruby_block
      cookbook_name "patroni"
      recipe_name "enable"
      block #<Proc:0x00000000044db008 /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb:93>
      block_name "wait for postgresql to start"
      only_if { #code block }
    end

    System Info:
    ------------
    chef_version=15.14.0
    platform=centos
    platform_version=7.9.2009
    ruby=ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
    program_name=/opt/gitlab/embedded/bin/chef-client
    executable=/opt/gitlab/embedded/bin/chef-client


Running handlers:
Running handlers complete
Chef Infra Client failed. 4 resources updated in 01 minutes 39 seconds
===STDERR===
There was an error running gitlab-ctl reconfigure:

ruby_block[wait for postgresql to start] (patroni::enable line 92) had an error: RuntimeError: PostgreSQL did not respond before service checks were exhausted

======
== Fatal error ==
Error updating PostgreSQL configuration. Please check the output
== Reverting ==
ok: down: patroni: 1s, normally up
Symlink correct version of binaries: OK
ok: run: patroni: (pid 23741) 1s
== Reverted ==
== Reverted to 11.11. Please check output for what went wrong ==

Solution

Check the log, I found that it's the configure_postgresql method of files/gitlab-ctl-commands/pg-upgrade.rb that throws the exception.

Seems there has no timeout options in method configure_postgresql of files/gitlab-ctl-commands/pg-upgrade.rb. So I added a timeout option when calling GitlabCtl::Util.chef_run in configure_postgresql.

/cc @prajnamas

Related issues

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion

Required

  • Merge Request Title, and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com
  • Pipeline is green on dev.gitlab.org if the change is touching anything besides documentation or internal cookbooks
  • trigger-package has a green pipeline running against latest commit

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Tests added
  • Integration tests added to GitLab QA
  • Equivalent MR/issue for the GitLab Chart opened
Edited by Zehua Zhang

Merge request reports