zdu_start_point defaults to resources that don't exist
Bug: zero_downtime_upgrade fails unless zdu_start_point is set to gitaly
Description
After the new update to the GET Image, The Zero Downtime Upgrade (ZDU) process in the GitLab Environment Toolkit fails when zdu_start_point is left at its default (consul).
In our environment, PostgreSQL is managed externally (for example via Terraform and not included in Ansible inventory), but now the playbook still attempts to run the Postgres upgrade stage.
Because no postgres group exists, it fails with the following error:
TASK \[zero_downtime_upgrade : Get PostgreSQL leader for ZDU ordering\] \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* \[ERROR\]: Task failed: Module failed: non-zero return code Origin: /gitlab-environment-toolkit/ansible/roles/zero_downtime_upgrade/tasks/main.yml:13:7
11 - name: Update PostgreSQL nodes 12 block: 13 - name: Get PostgreSQL leader for ZDU ordering ^ column 7
fatal: \[gitlab-community-prod-gitaly-1\]: FAILED! =\> changed=true cmd:
- gitlab-ctl
- get-postgresql-primary delta: '0:00:01.017523' end: '2025-10-06 16:42:57.003412' msg: non-zero return code rc: 1 start: '2025-10-06 16:42:55.985889' stderr: Consul agent is not enabled on this node stderr_lines: stdout: '' stdout_lines:
When the same playbook is run with:
zdu_start_point: gitaly
in the environment’s vars.yml, the upgrade proceeds normally and completes successfully.
The current Workaround is:
Add this to your environment’s vars.yml:
zdu_start_point: gitaly
This bypasses the Postgres stage entirely and allows the ZDU to continue from the first relevant GitLab component.
Is this the expected behavior of the new zdu variable or is there some sort of misconfiguration? If it is the expected behavior then that is very unfortunate since it's not very intuitive & easy as the previous behavior of the toolkit.
Expected Behavior
The Toolkit should:
- Automatically skip Postgres-related steps when no
postgresgroup exists in inventory, or - Default the
zdu_start_pointdynamically to the first available component.
Environment Details
- GitLab Environment Toolkit 3.8
- 18.2.6