Pipeline variable CHECKMODE is empty and can not be overwritten inside Deployer
Summary
In gitlab-org/release-tools!1871 (merged), the pipeline variable CHECKMODE was set to $TEST and is being sent when triggering a post-deployment migrations pipeline. When TEST is set to true, this is fine. However, when TEST is left empty, this causes CHECKMODE to be empty in the Deployer pipeline (sample pipeline):
CHECKMODE can not be overwritten to false by any Job because it is a pipeline variable and has higher precedence than job-level variables.
I found this after merging the MR feat: Flip DRY_RUN to false and run PDMs in K8S as secondary (!628), which was supposed to set DRY_RUN to false but failed to do so and ended with DRY_RUN being empty.
-
DRY_RUN=$CHECKMODE: https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/blob/03a797727ee42e7e6df78955bcfd9e696af3280c/.gitlab-ci.yml#L275-278 - Comment thread with more details
In gitlab-com, we expect DRY_RUN == "false" whenever we want to make any change to the infrastructure:
For the time being, I have worked around this by setting DRY_RUN to false explicitly. However, this is not a permanent solution. I would like to come up with something that allows us to specify the dry-run mode for the PDM pipeline from ChatOps, while at the same time, leaving CHECKMODE to be completely controlled within Deployer.
This is not a problem with auto-deploy because auto-deploy pipelines triggered by release/tools do not set CHECKMODE at all.
Possible Solutions
Send TEST=false from Chatops
Note
I think we should do this. Skarbek's comment on this issue also suggests that we should do it: #21271 (comment 2584246225)
Proposal: gitlab-com/chatops!573 (merged)
Chatops currently leaves TEST empty instead of setting it to false explicitly.
This logic happens to be in the Chatops::Release::Command class, which is used by multiple commands inside chatops:
$ rg -l -F '::Chatops::Release::Command'
lib/chatops/commands/feature.rb
lib/chatops/commands/gitaly.rb
lib/chatops/commands/release.rb
lib/chatops/commands/post_deploy_migrations.rb
lib/chatops/commands/auto_deploy.rb
lib/chatops/commands/helm.rb
lib/chatops/commands/production_checks.rb
lib/chatops/commands/rollback.rb
So, the effects of such a change may need to be considered carefully?
This solution may be preferable to the other two because it is in line with what we do when Deployer triggers a pipeline in gitlab-com: We always sets DRY_RUN to the default value of false.
release/tools should send TEST instead of CHECKMODE
The value of TEST can be passed to the downstream pipeline in Deployer. We can use this value inside the before_ci.sh script in Deployer. This will leave us with the following set of dry-run variables:
- release/tools:
TEST - Deployer:
CHECKMODE - gitlab-com: 2 flags for dry-run. Both must be set to the same value.
-
DRY_RUN: Used for adding jobs to the CI pipeline -
dry_run: Used by thek-ctltool, which is a wrapper aroundhelmfile
-
Set DRY_RUN explicitly in the trigger job in Deployer
When triggering a pipeline in gitlab-com, it is possible to explicitly set DRY_RUN to false. This decentralizes the management of DRY_RUN and in my opinion makes it harder to figure out where this value is actually being set.
Other Notes
This problem was hard to debug because the prepare-variables CI job shows that CHECKMODE is false. This is because the job-level variable CHECKMODE is indeed false. However, when used inside the trigger job definition, variable precedence kicks in and the value of the pipeline variable is used!
-
Prepare job shows
CHECKMODE=false -
gitlab-com pipeline variables shows
DRY_RUNanddry_runas empty
If there is some way to avoid overloading and show the pipeline variable status inside the prepare job, that would also be a nice improvement for future developers in Deployer.
