CI Jobs where dry-runs are expected should not make write actions in production

During a recent incident: #661 (closed) a CI job that was expected to be a dry-run style job, performed destructive actions to the production environment. Utilize this issue to determine if we can provide read-only credentials such that dry-runs that may be misconfigured, do not perform destructive actions.

We currently have a service account with cluster-admin privileges. We can do the similar, by providing a new service account, and provide it necessary read only privileges. This would require a modification to our CI to determine which service account to utilize.

  • Test proposed solution - the below but on pre and by hand
  • create RO service accounts for other environments k8s-worksloads-ro - terraform; manual role application containing:
  • create new cluster role secrets-ro - manual with get and list actions for resource secrets
  • create cluster role bindings for other clusters - manual, documented in k8s-workloads/gitlab-com/README.md
  • modify CI scripts to pull the correct Service Account credentials - hopefully this can be done in our common repo
    • Set env var DRY_RUN default to true
    • if set to false we'll wrap the gcloud auth ... appropriate to grab the proper credentials
  • add CI variable to all projects/repos in ops
    • Add SERVICE_KEY_RO for all clusters, and for all repos (4 repos X 3 clusters)
Edited by John Skarbek