Spike Google Secrets Manager and Managed Certs

Per readiness!44 (comment 412604811), making an issue to track investigating and comparing the Google Secrets Manager and upcoming Google Certificate Authority Service managed offerings to our current proposed plan of a "self managed" secrets management solution with Vault

We will want to evaluate the two over the following sects

Secrets Management

For this we will consider a secret to be some kind of arbitrary text string (e.g. password).

How Do I got about creating a secret object?
How do I grant a class of machine we have to have access to a secret?
Is extra software needed on a Virtual machine in order for it to obtain secrets?
Do we have the ability to do versioned secrets, and restore deleted secrets?
How do we go about granting our CI jobs access to secrets? What are the steps/software involved? How do we control which CI jobs have access to which secrets
How is secret isolation encapsulated between our different environments? e.g. gprd, gstg, pre, ops, org-ci, and auxiliary environments

PKI Infrastructure Management

How do we go about creating a new PKI CA and associated infrastructure
How do we control which class of VMs have access to generate certificates, and any restrictions around it?
How do we setup our VM infrastructure to automatically rotate certificates?

Integration points

What are the integration points for chef and our VM infrastructure? How easy is it to write a new class in https://gitlab.com/gitlab-cookbooks/gitlab_secrets/-/blob/master/libraries/secrets.rb for the secret manager? What extra tooling is involved?
What are the integration points for Kubernetes secrets? How easy is it to write a new exec function in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab-secrets/helmfile.yaml to pull secrets?

Operational considerations

How do we handle authentication and authorization for users (not our machines) to interact with the secret management tooling
How do we handle logging and monitoring of the solution. This includes audit logging which might need to go to parties outside infrastructure (e.g. security).
How do we handle backups and restores of the data inside the solution
How do we handle availability and disaster recovery of the solution
Do we have an understanding of the overall cost profile of the solution.

@ggillies / @devin - created this to start the list. Feel free to add to the description to list out the questions we want to answer.

Edited Sep 28, 2020 by Graeme Gillies

Assignee Loading

Time tracking Loading