fix: Set sv timeout when restarting Gitaly

What does this MR do?

Set sv timeout when restarting Gitaly

Gitaly has a graceful timeout of 60 seconds by default, but sv restart times out at 7 seconds. If a long operation (clone/push) is running when Gitaly is restarted (due to an environment variable change for example), sv may well timeout while gitaly is still gracefully waiting, leading to gitlab-ctl reconfigure failing. A re-run succeeds, because gitaly has already been restarted, and usually by the time a human has noticed the graceful restart will be completed.

This sort of behavior has been reported in #4445 (closed) and gitlab#341573 (closed). More particularly, we've seen it recently in Dedicated, with busy Gitaly servers (known long clones) and changing ulimit-related env vars), and it caused angst.

Setting the sv timeout (-w, using sv_timeout) to a little bit more than the graceful timeout (whatever it is configured to be) should dull this particular edge case.

Needs the go_duration gem to reliably parse the Go duration that can be provided to Gitaly.

Internal:

Testing: I did a little ad-hoc testing live on a test deployment (setting sv_timeout to 65 in enable.rb, omitting more robust gem duration calculation) and it had the expected effect. With a long git operation running, reconfigure completed successfully after waiting 65 seconds (interrupting the push, but that's expected) vs failing at 7 seconds because gitaly hadn't restarted yet.

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

  • MR title and description are up to date, accurate, and descriptive.
  • MR targeting the appropriate branch.
  • Latest Merge Result pipeline is green.
  • When ready for review, MR is labeled workflowready for review per the Distribution MR workflow.

For GitLab team members

If you don't have access to this, the reviewer should trigger these jobs for you during the review process.

  • The manual Trigger:ee-package jobs have a green pipeline running against latest commit.
  • If config/software or config/patches directories are changed, make sure the build-package-on-all-os job within the Trigger:ee-package downstream pipeline succeeded.
  • If you are changing anything SSL related, then the Trigger:package:fips manual job within the Trigger:ee-package downstream pipeline must succeed.
  • If CI configuration is changed, the branch must be pushed to dev.gitlab.org to confirm regular branch builds aren't broken.

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes.
  • Documentation created/updated.
  • Tests added.
  • Integration tests added to GitLab QA.
  • Equivalent MR/issue for the GitLab Chart opened.
  • Validate potential values for new configuration settings. Formats such as integer 10, duration 10s, URI scheme://user:passwd@host:port may require quotation or other special handling when rendered in a template and written to a configuration file.
Edited by Craig Miskell

Merge request reports

Loading