Improve Vault setup and access for component owners
Overview
Teams adopting the Component Ownership Model need to set up Vault for new or existing GKE/EKS clusters, but the current process is painful and requires significant SRE involvement. This issue tracks improvements needed to make Vault setup accessible to teams without SRE expertise.
Current Challenges
1. Painful Bootstrapping Process
- Setting up Vault for already-created GKE clusters (outside of
config-mgmt) is difficult - Documentation exists but lacks clear step-by-step guidance
- Teams unfamiliar with GitLab's Vault infrastructure struggle to get started
2. Manual Steps Required
- Manual step required for bootstrapping a CI vault token at
/data/ops-gitlab-net/gitlab-com/gl-infra/config-mgmt/vault-production/kubernetes/clusters/ - Current process requires manually applying Helm charts to clusters
- This step is challenging to automate
3. Access Restrictions
- Team members outside infra section don't have default access to vault.gitlab.net
- CLI access via
vault kvis not possible for teams outside infra becausevault.ops.gke.gitlab.netis available internally only - Requires access to ops GKE cluster to use CLI tools
- Combined with manual steps above, vault access is a requirement for most teams looking to own infrastructure
4. Ingress to Vault Documentation Gap
- Process for enabling clusters to access
vault.ops.gke.gitlab.netwith PSC is not well documented - Can be non-trivial to figure out initially
- Often requires creating external secrets, which adds complexity
Goals
Create a clear, tested path for teams without SRE access to:
- Set up Vault for new clusters in GKE and/or EKS
- Onboard existing clusters to Vault
- Minimize SRE involvement in this process
Proposed Solutions
-
Improve Documentation
- Create step-by-step guide for Vault setup for non-SREs
- Document the PSC ingress process clearly
- Provide examples of external secrets configuration
-
Automate Manual Steps
- Investigate automating the CI vault token bootstrapping process
- Reduce manual Helm chart application requirements
-
Improve Access
- Evaluate options for providing CLI access to teams outside infra
- Consider alternative approaches to vault.ops.gke.gitlab.net access restrictions
-
Testing & Validation
- Create a tested procedure that can be followed by non-SREs
- Document troubleshooting steps for common issues
Related Issues
- Component Ownership Model feedback: #27175 (closed)
- Infrastructure Support for Usage Billing: gl-infra#1637 (closed)
- Vault usage documentation: https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/vault/usage.md
- Vault access documentation: https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/vault/access.md
Success Criteria
- Clear, step-by-step documentation for Vault setup exists and is tested
- Non-SRE teams can successfully set up Vault for new clusters with minimal SRE involvement
- Non-SRE teams can successfully onboard existing clusters to Vault with minimal SRE involvement
- PSC ingress process is clearly documented with examples
- Troubleshooting guide covers common issues