Pipeline variable CHECKMODE is empty and can not be overwritten inside Deployer

Summary

In gitlab-org/release-tools!1871 (merged), the pipeline variable CHECKMODE was set to $TEST and is being sent when triggering a post-deployment migrations pipeline. When TEST is set to true, this is fine. However, when TEST is left empty, this causes CHECKMODE to be empty in the Deployer pipeline (sample pipeline):

image

CHECKMODE can not be overwritten to false by any Job because it is a pipeline variable and has higher precedence than job-level variables.

I found this after merging the MR feat: Flip DRY_RUN to false and run PDMs in K8S as secondary (!628), which was supposed to set DRY_RUN to false but failed to do so and ended with DRY_RUN being empty.

In gitlab-com, we expect DRY_RUN == "false" whenever we want to make any change to the infrastructure:

https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/4302f817cf44d44b518cf2cb792fa1839c7d44cf/.gitlab-ci.yml#L2824

For the time being, I have worked around this by setting DRY_RUN to false explicitly. However, this is not a permanent solution. I would like to come up with something that allows us to specify the dry-run mode for the PDM pipeline from ChatOps, while at the same time, leaving CHECKMODE to be completely controlled within Deployer.

This is not a problem with auto-deploy because auto-deploy pipelines triggered by release/tools do not set CHECKMODE at all.

Possible Solutions

Send TEST=false from Chatops

Note

I think we should do this. Skarbek's comment on this issue also suggests that we should do it: #21271 (comment 2584246225)

Proposal: gitlab-com/chatops!573 (merged)

Chatops currently leaves TEST empty instead of setting it to false explicitly.

https://gitlab.com/gitlab-com/chatops/blob/74e722d392926996cce7ae1b5e85c372ba80dbd5/lib/chatops/release/command.rb#L76

This logic happens to be in the Chatops::Release::Command class, which is used by multiple commands inside chatops:

$ rg -l -F '::Chatops::Release::Command'
lib/chatops/commands/feature.rb
lib/chatops/commands/gitaly.rb
lib/chatops/commands/release.rb
lib/chatops/commands/post_deploy_migrations.rb
lib/chatops/commands/auto_deploy.rb
lib/chatops/commands/helm.rb
lib/chatops/commands/production_checks.rb
lib/chatops/commands/rollback.rb

So, the effects of such a change may need to be considered carefully?

This solution may be preferable to the other two because it is in line with what we do when Deployer triggers a pipeline in gitlab-com: We always sets DRY_RUN to the default value of false.

release/tools should send TEST instead of CHECKMODE

The value of TEST can be passed to the downstream pipeline in Deployer. We can use this value inside the before_ci.sh script in Deployer. This will leave us with the following set of dry-run variables:

  1. release/tools: TEST
  2. Deployer: CHECKMODE
  3. gitlab-com: 2 flags for dry-run. Both must be set to the same value.
    1. DRY_RUN: Used for adding jobs to the CI pipeline
    2. dry_run: Used by the k-ctl tool, which is a wrapper around helmfile

Set DRY_RUN explicitly in the trigger job in Deployer

When triggering a pipeline in gitlab-com, it is possible to explicitly set DRY_RUN to false. This decentralizes the management of DRY_RUN and in my opinion makes it harder to figure out where this value is actually being set.

Other Notes

This problem was hard to debug because the prepare-variables CI job shows that CHECKMODE is false. This is because the job-level variable CHECKMODE is indeed false. However, when used inside the trigger job definition, variable precedence kicks in and the value of the pipeline variable is used!

If there is some way to avoid overloading and show the pipeline variable status inside the prepare job, that would also be a nice improvement for future developers in Deployer.

Edited by Siddharth Kannan