Vault Shared Key (Knowledge Graph <-> GitLab)
## Problem to solve !224386 added `Analytics::KnowledgeGraph::JwtAuth`, which signs HS256 JWTs for requests between GitLab Rails and the GKG service. It uses the `Gitlab::JwtAuthenticatable` mixin, which auto-generates a local secret file (`.gitlab_knowledge_graph_secret`) on first boot. That works in development but not in Kubernetes: 1. Each Rails pod generates its own secret on startup, so tokens from one pod won't verify on another. 2. The GKG service in `orbit-stg` needs the same key to verify incoming JWTs and sign outbound ones for Gitaly. Both sides need to share one HS256 key. The following discussion from !224386 should be addressed: - [ ] @ggray-gitlab started a [discussion](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/224386#note_3109667381): (+1 comment) > @michaelangeloio This has a secret being generated for a synchronous signing and written to disk in a file. Will other services need to be able to validate these tokens? If so, how will they access it? Additionally, if this is running in a cluster, how will we ensure that the secrets stay in sync, or is there an assumption that there will only ever be one instance we need to worry about? ## Proposed solution Store one shared key per environment in Vault and distribute it to both clusters via ExternalSecrets. Staging and production use separate keys at environment-scoped paths under `k8s/shared/knowledge-graph/`. ### Architecture ``` Vault (k8s mount) └── shared/knowledge-graph/stg/jwt ← staging key │ └── key: <base64-encoded 32 bytes> │ │ │ ├──► ESO in gstg-gitlab-gke │ │ └──► K8s Secret "gitlab-knowledge-graph-jwt-v1" │ │ in "gitlab" namespace │ │ └──► mounted as file at .gitlab_knowledge_graph_secret │ │ (read by JwtAuthenticatable) │ │ │ └──► ESO in orbit-stg │ └──► K8s Secret "gkg-secrets" │ in "gkg" namespace │ └──► mounted at /etc/secrets/gitlab/jwt/* │ (read by GKG binary) │ └── shared/knowledge-graph/prd/jwt ← production key (future) ``` Both sides read from the same Vault path via different Kubernetes auth mounts. ### Step 1 — Vault policies (config-mgmt) **MR:** https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/13377 **File:** `environments/vault-production/kubernetes.tf` Two changes in this file: **a) Grant orbit-stg read access** Add a `gkg` auth role to the `orbit-stg` cluster block so the `gkg-secrets` service account in `gkg` namespace can read the shared path: ```terraform gkg = { service_accounts = ["gkg-secrets"] namespaces = ["gkg"] readonly_secret_paths = [ "k8s/shared/knowledge-graph/stg/jwt", ] } ``` **b) Grant gstg Rails read access** Append the shared path to the existing `gitlab` role's `readonly_secret_paths`: ```terraform "k8s/shared/knowledge-graph/stg/jwt", ``` **File:** `environments/vault-production/secrets_policies.tf` **c) Okta admin policy for staging secrets** Grant the `knowledge_graph` Okta group admin access to staging secrets only, so the team can create and rotate the staging key via Vault UI: ```terraform "shared/knowledge-graph/stg/*" = { admin = { groups = local.groups.knowledge_graph } } ``` ### Step 2 — Write the secret to Vault Generate and store the key (requires Vault admin access via Okta, granted by Step 1c): ```bash KEY=$(openssl rand -base64 32) vault kv put -mount=k8s shared/knowledge-graph/stg/jwt key="$KEY" ``` ### Step 3 — Rails side ExternalSecret (k8s-workloads-gitlab-com) **a) Create the ExternalSecret** **File:** `releases/gitlab-external-secrets/values/values.yaml.gotmpl` Add an entry under the existing `gitlab-shared-secrets` SecretStore: ```yaml gitlab-knowledge-graph-jwt-v1: refreshInterval: 0 secretStoreName: gitlab-shared-secrets target: creationPolicy: Owner deletionPolicy: Delete data: - remoteRef: key: knowledge-graph/stg/jwt property: key version: "1" secretKey: knowledge_graph_jwt_shared_key ``` The `key` is relative to the SecretStore path. `gitlab-shared-secrets` has `path: shared`, so `knowledge-graph/stg/jwt` resolves to `k8s/data/shared/knowledge-graph/stg/jwt` in Vault. **b) Mount the secret as a file** `JwtAuthenticatable` reads secrets from a file, not env vars. This needs a companion MR to the GitLab Helm chart (`gitlab-org/charts/gitlab`). Add a `_knowledge_graph.tpl` template: ```yaml {{- define "gitlab.knowledgeGraph.mountSecrets" -}} {{- if .Values.global.appConfig.knowledgeGraph.enabled -}} - secret: name: {{ .Values.global.appConfig.knowledgeGraph.secret }} items: - key: {{ .Values.global.appConfig.knowledgeGraph.key }} path: knowledge_graph/.gitlab_knowledge_graph_secret {{- end -}} {{- end -}} ``` Then reference it in `releases/gitlab/values/gstg.yaml.gotmpl`: ```yaml global: appConfig: knowledgeGraph: enabled: true secret: gitlab-knowledge-graph-jwt-v1 key: knowledge_graph_jwt_shared_key ``` ### Step 4 — GKG side vault-secrets release (gitlab-helmfiles) The GKG helm chart expects a K8s Secret via `secrets.existingSecret`. In the orbit-stg helmfile (https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/merge_requests/9971), this is set to `gkg-secrets`. That secret needs these keys: | Key | Mount path | Purpose | |-----|-----------|---------| | `gitlab-jwt-verifying-key` | `/etc/secrets/gitlab/jwt/verifying_key` | Verify incoming JWTs from Rails | | `gitlab-jwt-signing-key` | `/etc/secrets/gitlab/jwt/signing_key` | Sign outbound JWTs for Gitaly calls | | `datalake-password` | `/etc/secrets/datalake/password` | ClickHouse datalake access | | `graph-password` | `/etc/secrets/graph/password` | ClickHouse graph access | With HS256, the verifying and signing keys hold the same value. Add a vault-secrets release to `releases/gkg/helmfile.yaml.gotmpl`, following the `data-insights-platform` pattern: ```yaml - name: gkg-secrets chart: oci://registry.ops.gitlab.net/gitlab-com/gl-infra/charts/vault-secrets version: ~1.9.0 namespace: gkg installed: {{ .Values | get "gkg.installed" false }} labels: tier: inf values: - values-secrets/values.yaml.gotmpl - values-secrets/{{ .Environment.Name }}.yaml.gotmpl ``` **File:** `releases/gkg/values-secrets/values.yaml.gotmpl` ```yaml authMountPath: "kubernetes/{{ default .Values.cluster .Values.cluster_vault }}" clusterLocation: "{{ .Values.region }}" clusterName: "{{ .Values.cluster }}" clusterProject: "{{ .Values.google_project }}" secretStores: - name: gkg-secrets role: gkg path: shared serviceAccount: name: gkg-secrets ``` **File:** `releases/gkg/values-secrets/orbit-stg.yaml.gotmpl` ```yaml externalSecrets: gkg-secrets: refreshInterval: 0 secretStoreName: gkg-secrets target: creationPolicy: Owner deletionPolicy: Delete data: - remoteRef: key: knowledge-graph/stg/jwt property: key version: "1" secretKey: gitlab-jwt-verifying-key - remoteRef: key: knowledge-graph/stg/jwt property: key version: "1" secretKey: gitlab-jwt-signing-key ``` ClickHouse passwords (`datalake-password`, `graph-password`) come from separate Vault paths; add them once ClickHouse is provisioned for orbit-stg. ### Deployment order 1. Merge config-mgmt MR (Vault policies) — applied via Atlantis 2. Write the secret to Vault 3. Merge k8s-workloads-gitlab-com + charts/gitlab MRs (ExternalSecret + file mount for Rails) 4. Merge gitlab-helmfiles MR (vault-secrets release for GKG, extends https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/merge_requests/9971) 5. ArgoCD syncs, ESO pulls from Vault and creates K8s Secrets in both clusters
issue