Skip to content

zdu_start_point defaults to resources that don't exist

Bug: zero_downtime_upgrade fails unless zdu_start_point is set to gitaly

Description

After the new update to the GET Image, The Zero Downtime Upgrade (ZDU) process in the GitLab Environment Toolkit fails when zdu_start_point is left at its default (consul).

In our environment, PostgreSQL is managed externally (for example via Terraform and not included in Ansible inventory), but now the playbook still attempts to run the Postgres upgrade stage.

Because no postgres group exists, it fails with the following error:

TASK \[zero_downtime_upgrade : Get PostgreSQL leader for ZDU ordering\] \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \[ERROR\]: Task failed: Module failed: non-zero return code Origin: /gitlab-environment-toolkit/ansible/roles/zero_downtime_upgrade/tasks/main.yml:13:7

11 - name: Update PostgreSQL nodes 12 block: 13 - name: Get PostgreSQL leader for ZDU ordering ^ column 7

fatal: \[gitlab-community-prod-gitaly-1\]: FAILED! =\> changed=true cmd:

- gitlab-ctl
- get-postgresql-primary delta: '0:00:01.017523' end: '2025-10-06 16:42:57.003412' msg: non-zero return code rc: 1 start: '2025-10-06 16:42:55.985889' stderr: Consul agent is not enabled on this node stderr_lines: stdout: '' stdout_lines:

When the same playbook is run with:

zdu_start_point: gitaly

in the environment’s vars.yml, the upgrade proceeds normally and completes successfully.

The current Workaround is:

Add this to your environment’s vars.yml:

zdu_start_point: gitaly

This bypasses the Postgres stage entirely and allows the ZDU to continue from the first relevant GitLab component.

Is this the expected behavior of the new zdu variable or is there some sort of misconfiguration? If it is the expected behavior then that is very unfortunate since it's not very intuitive & easy as the previous behavior of the toolkit.


Expected Behavior

The Toolkit should:

  • Automatically skip Postgres-related steps when no postgres group exists in inventory, or
  • Default the zdu_start_point dynamically to the first available component.

Environment Details

  • GitLab Environment Toolkit 3.8
  • 18.2.6
Edited by Grant Young
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information