CI Jobs where dry-runs are expected should not make write actions in production
During a recent incident: #661 (closed) a CI job that was expected to be a dry-run style job, performed destructive actions to the production environment. Utilize this issue to determine if we can provide read-only credentials such that dry-runs that may be misconfigured, do not perform destructive actions.
We currently have a service account with cluster-admin privileges. We can do the similar, by providing a new service account, and provide it necessary read only privileges. This would require a modification to our CI to determine which service account to utilize.
-
Test proposed solution - the below but on pre and by hand -
create RO service accounts for other environments k8s-worksloads-ro- terraform; manual role application containing:roles/compute.networkUserroles/container.viewer- https://ops.gitlab.net/gitlab-com/gitlab-com-infrastructure/-/merge_requests/1379
- #664 (comment 285451625)
-
create new cluster role secrets-ro- manual withgetandlistactions for resourcesecrets -
create cluster role bindings for other clusters - manual, documented in k8s-workloads/gitlab-com/README.md -
modify CI scripts to pull the correct Service Account credentials - hopefully this can be done in our common repo - Set env var
DRY_RUNdefault to true - if set to false we'll wrap the
gcloud auth ...appropriate to grab the proper credentials
- Set env var
-
add CI variable to all projects/repos in ops - Add
SERVICE_KEY_ROfor all clusters, and for all repos (4 repos X 3 clusters)
- Add
Edited by John Skarbek