Chef to vault migration proposal
- Status: proposed
- Deciders: Infra department
- Last update date: 2022-08-22
Context and Problem Statement
In our ongoing migration from several secrets management tools, we want to consolidate to one tool and use Hashicorp Vault. To make the migration from our chef managed secrets configuration. We currently use chef and the secrets cookbook of fetching secrets as node attributes in the node object and filling out templates.
Consul template (which does not need nor is dependent on consul) would run as a daemon in our VMs. It can fetch secrets from vault and watches for template updates, re-updating configuration files whenever changes occur and can also run commands after the template updates (for HUPs, reconfigures, etc).
If we switch templating to use consul template we can use chef to install, configure, and manage consul template as well as application templates, but leave it up to consul template to handle authentication, fetch secrets and update config files:
This would work something like this, we create a consul template like this:
# cookbook/templates/config.ctmpl
{{ with vault "gitlab/creds/readonly" }}
[database]
username = "{{ .Data.username }}"
password = "{{ .Data.password }}"
{{ end }}
We configure consul template to look for that template and place it in the appropriate location:
template {
source = "/etc/gitlab/config.ctmpl"
destination = "/etc/gitlab/gitlab.rb"
command = "sudo gitlab-ctl reconfigure"
}
template_config {
static_secret_render_interval = "10m"
}
Vault Template will read the template at /etc/gitlab/config.ctmpl
, rendering the config and secrets /etc/gitlab/gitlab.rb
, and reconfigure the service when the contents change.
Authentication
Authentication and fetching of secrets is handled by Vault Agent. Both machine and service account authentication types are supported for GCP with the Google Cloud Platform Auth vault plugin.
Decision Drivers
- Decouple secrets from chef to make it easier to migrate away from chef
- Use a tool works with whatever we want to use as configuration management in the future such as ansible
Positive Consequences
- We don't need to build tooling
- Offers client-side caching
- Templating via vault template
- Automatically pulls secrets at a configured interval and doesn't rely on chef. We wouldn't need to manually run chef-client or wait 30 minutes in between chef runs to update secrets
- Built-in metrics
- Would help us start moving away from chef
Negative Consequences
- This requires us to install, manage, and operate a separate tool to provide secrets to applications.
- Would require us to rewrite templates currently used
Considered Options
Script with vault ruby client
If we wanted to work like a node attribute it could work something like:
# https://gitlab.com/gitlab-cookbooks/gitlab_secrets/-/blob/master/libraries/secrets.rb
class HashicorpVault
def initialize(path, key, node)
@path = path
@key = key
@node = node
Chef::Log.info("gitlab_secrets: BE: 'hashicorp_vault', path: '#{@path}', key:'#{@key}'")
end
def get
secret = @path['name']
Vault.with_retries(Vault::HTTPConnectionError, Vault::HTTPError, attempts: 5) do
return Vault.logical.read(path).data
end
end
Positive Consequences
- We could continue to follow our chef secrets cookbook convention of using them as node attributes in the node object
- Quicker than migrating template files
Negative Consequences
- Client is as not frequently updated/maintained (last three release have about a year gap)
- No caching unless we implement it ourselves
- Reliance on chef for secrets management
Decision Outcome
Chosen option: Vault agent would be the best option for services we are considering migrating off chef. It would allow us to integrate with vault without requiring us to use a specific configuration management tool. We can also do one or the other depending on the service we are migrating (for example, doing ruby client for one-offs and consul templates for fleets)
Migration proposal
Migration should start with roles that are still using chef-vault
backend for secrets management:
-
about-gitlab-com.json -
about-staging-gitlab-com.json -
aws-ruby-scripts.json -
azure-ruby-scripts.json -
build-runners-gitlab-org.json -
build-trigger-runner-manager-gitlab-org.json -
ci-base.json -
dev-gitlab-org.json -
gitlab-qa-tunnel.json -
gitlab-runner-gsrm3.json -
gitlab-runner-gsrm4.json -
gitlab-runner-gsrm5.json -
gitlab-runner-gsrm6.json -
gitlab-runner-prm3.json -
gitlab-runner-prm4.json -
gitlab-runner-srm3.json -
gitlab-runner-srm4.json -
gitlab-runner-srm5.json -
gitlab-runner-srm6.json -
gitlab-runner-srm7.json -
gitlab-runner-stg-srm-gce-us-east1-c.json -
gitlab-runner-stg-srm-gce-us-east1-d.json -
gitlab-runners-prometheus-gce-us-east1-c.json -
gitlab-runners-prometheus-gce-us-east1-d.json -
gprd-base.json -
gprd-infra.json -
gprd-wale.json -
gstg-base.json -
gstg-infra.json -
gstg-wale.json -
infra-base.json -
one-off-wale.json -
ops-base.json -
ops-wale.json -
packages-gitlab-com.json -
pre-base.json -
runners-manager-gitlab-qa-blue-1.json -
syslog-client.json -
testbed-base.json
Followed by anything using gkms
secrets backend:
-
aptly-gitlab-com.json -
ci-base.json -
db-benchmarking-base-bastion.json -
db-benchmarking-base-blackbox.json -
db-benchmarking-base-db-patroni-2004-cascade.json -
db-benchmarking-base-db-patroni-amcheck-ci-gprd.json -
db-benchmarking-base-db-patroni-amcheck-ci-gstg.json -
db-benchmarking-base-db-patroni-amcheck-main-gprd.json -
db-benchmarking-base-db-patroni-amcheck-main-gstg.json -
db-benchmarking-base-db-patroni-as-test.json -
db-benchmarking-base-db-patroni-bs-test.json -
db-benchmarking-base-db-patroni-ci-data-analytics.json -
db-benchmarking-base-db-patroni-ci-pg12-1604.json -
db-benchmarking-base-db-patroni-ci-pg12-2004.json -
db-benchmarking-base-db-patroni-data-analytics.json -
db-benchmarking-base-db-patroni-main-pg12-1604.json -
db-benchmarking-base-db-patroni-main-pg12-2004.json -
db-benchmarking-base-db-patroni-p213-ci.json -
db-benchmarking-base-db-patroni-p213-main.json -
db-benchmarking-base-db-patroni-registry.json -
db-benchmarking-base-db-patroni-rh-ci-gprd.json -
db-benchmarking-base-db-patroni-rh-main-gprd.json -
db-benchmarking-base-db-patroni.json -
db-benchmarking-base-db-pg12ute-bs-source.json -
db-benchmarking-base-db-pg12ute-bs-target.json -
db-benchmarking-base-db-pg12ute-patroni-source.json -
db-benchmarking-base-db-pg12ute-patroni-target.json -
db-benchmarking-base-db-pgbouncer-common.json -
db-benchmarking-base-db-postgres.json -
db-benchmarking-base.json -
db-benchmarking-infra.json -
db-benchmarking-walg-1604.json -
db-benchmarking-walg-2004.json -
db-integration-base-bastion.json -
db-integration-base-blackbox.json -
db-integration-base-db-patroni.json -
db-integration-base-db-pgbouncer-common.json -
db-integration-base.json -
db-integration-infra-sd-exporter.json -
db-integration-infra.json -
db-integration-walg.json -
dev-gitlab.json -
gitlab-qa-tunnel.json -
gitlab-runner-bastion.json -
gprd-base-bastion-teleport.json -
gprd-base-bastion.json -
gprd-base-blackbox.json -
gprd-base-console-ro-node.json -
gprd-base-db-patroni-2004.json -
gprd-base-db-patroni-registry.json -
gprd-base-db-patroni.json -
gprd-base-db-pgbouncer-common.json -
gprd-base-db-postgres.json -
gprd-base-runner.json -
gprd-base-stor-gitaly-praefect.json -
gprd-base-stor-praefect.json -
gprd-base.json -
gprd-cny-omnibus-version.json -
gprd-infra-consul.json -
gprd-infra-sd-exporter.json -
gprd-infra.json -
gprd-omnibus-version.json -
gprd-walg-2004.json -
gprd-walg.json -
gstg-base-bastion-teleport.json -
gstg-base-blackbox.json -
gstg-base-console-ro-node.json -
gstg-base-db-patroni-2004.json -
gstg-base-db-patroni-ci-2004.json -
gstg-base-db-patroni-ci.json -
gstg-base-db-patroni.json -
gstg-base-db-pgbouncer-common.json -
gstg-base-db-postgres.json -
gstg-base-runner.json -
gstg-base-stor-gitaly-praefect-cny.json -
gstg-base-stor-praefect-cny.json -
gstg-base-stor-praefect.json -
gstg-base.json -
gstg-cny-omnibus-version.json -
gstg-infra-consul.json -
gstg-infra-geo-secondary.json -
gstg-infra-sd-exporter.json -
gstg-infra.json -
gstg-omnibus-version.json -
gstg-ref-omnibus-version.json -
gstg-walg-2004.json -
gstg-walg.json -
ops-base-bastion.json -
ops-base-blackbox.json -
ops-base-runner.json -
ops-base.json -
ops-infra-consul.json -
ops-infra-gitlab.json -
ops-infra-nonprod-proxy.json -
ops-infra-prod-proxy.json -
ops-infra-proxy.json -
ops-infra-sd-exporter.json -
ops-infra-sentry.json -
ops-infra.json -
org-ci-base-bastion.json -
org-ci-base-runner.json -
org-ci-base.json -
prdsub-base-bastion.json -
pre-base-bastion.json -
pre-base.json -
pre-infra-consul.json -
pre-infra-sd-exporter.json -
pre-infra.json -
release-gitlab-omnibus-version.json -
release-gitlab.json -
runners-manager-private.json -
runners-manager-shared-gitlab-org.json -
runners-manager-shared.json -
stgsub-base-bastion.json -
testbed-base.json -
testbed-omnibus-version.json -
windows-ci-base-bastion.json -
windows-ci-base-runner.json
And finally, anything using the Google Secrets Management (GSM) backend