Spike Google Secrets Manager and Managed Certs
Per readiness!44 (comment 412604811), making an issue to track investigating and comparing the Google Secrets Manager and upcoming Google Certificate Authority Service managed offerings to our current proposed plan of a "self managed" secrets management solution with Vault
We will want to evaluate the two over the following sects
Secrets Management
For this we will consider a secret to be some kind of arbitrary text string (e.g. password).
-
How Do I got about creating a secret object? -
How do I grant a class of machine we have to have access to a secret? -
Is extra software needed on a Virtual machine in order for it to obtain secrets? -
Do we have the ability to do versioned secrets, and restore deleted secrets? -
How do we go about granting our CI jobs access to secrets? What are the steps/software involved? How do we control which CI jobs have access to which secrets -
How is secret isolation encapsulated between our different environments? e.g. gprd,gstg,pre,ops,org-ci, and auxiliary environments
PKI Infrastructure Management
-
How do we go about creating a new PKI CA and associated infrastructure -
How do we control which class of VMs have access to generate certificates, and any restrictions around it? -
How do we setup our VM infrastructure to automatically rotate certificates?
Integration points
-
What are the integration points for chef and our VM infrastructure? How easy is it to write a new class in https://gitlab.com/gitlab-cookbooks/gitlab_secrets/-/blob/master/libraries/secrets.rb for the secret manager? What extra tooling is involved? -
What are the integration points for Kubernetes secrets? How easy is it to write a new execfunction in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab-secrets/helmfile.yaml to pull secrets?
Operational considerations
-
How do we handle authentication and authorization for users (not our machines) to interact with the secret management tooling -
How do we handle logging and monitoring of the solution. This includes audit logging which might need to go to parties outside infrastructure (e.g. security). -
How do we handle backups and restores of the data inside the solution -
How do we handle availability and disaster recovery of the solution -
Do we have an understanding of the overall cost profile of the solution.
@ggillies / @devin - created this to start the list. Feel free to add to the description to list out the questions we want to answer.
Edited by Graeme Gillies