Add ability to leverage an external Vault

What does this MR do and why?

This MR allows to deploy a management cluster by relying on an External Vault (Community edition, or possibly OpenBao) (Sylva does not support Vault enterprise) rather than deploying one.

This MR assumes that the vault paths and URL are configurables (!4427 (merged) and !4451 (merged) respectively)

The MR depends on !6061 (merged) as well, since verifying TLS becomes crucial when communicating with an external vault.

Assumptions regarding the External Vault

Authentication

The External Vault identities adminstrator should be the only one responsible for managing human users accounts that are allowed to connect to. Hence we choose to not enable Sylva OIDC authentication against vault. Indeed, it would introduce a new authentication path out of the control of the External Vault identities adminstrator.

Limited privileges granted to Sylva

This MR introduces the possibility to configure the External Vault through the Vault API. As a consequence, the Sylva stack could create/configure all nessecary secret engines (kubernetes authentication and KV store), as well as the policies and roles. However, we do consider that this is giving too many privileges to the Sylva stack. Indeed, a compromised stack might compomise the External Vault and, consequently, all the platforms that depend on it.

This is why we chose to let the External Vault be responsible for creating and configuring its secret engine and k8s auth method.

The External Vault is only expected to provide an authentication token (that will be injected in the values of the Sylva stack) with CRUD rights on the path auth/<kubernetes auth path>/config.

The authentication token provided by the External Vault should be revoked by the later as soon as the Sylva deployment is completed.

Secret Lifecycle

During a deployment, the secrets of various units are created, via the CRD randomsecret, in the External Vault secret path. If the secret exists, it is not modified.

Required Vault Configuration

A key/value (kv) secrets engine, version 2, must be enabled on the external vault. the path name can be the default name secret or the custom name set in .Values.security.vault.path.secret

The kubernetes auth method must be enabled to allow some Sylva resources (randomsecret and the vault clustersecretstore) to authenticate with Vault using a Kubernetes Service Account Token. The access policies, secret-reader and secret-rw, with roles bounding these policies to the service account vault must be configured as well.

An example of External Vault configuration is given here: enable-vault-auth-k8s.sh

Time Synchronization

Use NTP to ensure that the External Vault and the management cluster nodes agree about what time it is. When a Sylva component authenticate to Vault, the later checks the nbf claim in JWT token issued from the service account vault, and if Vault has significant clock skew with Sylva control nodes, authentication will fail.

Debugging

The authentication can be tested with the following script (for debugging purpose): test-login-sample.sh

Expected Output:

$ ./test-login-sample.sh 
++++++++++++++ Token Vault +++++++++++++++++
eyJhbGciO........

++++++++++++++ Token Vault Decoded +++++++++++++++++
Header:
{
    "alg": "RS256",
    "kid": "snvS0QMUV08-k9vUQelJvJSh_YL5V-uZA9oGtAjEbls"
}
Claims:
{
    "aud": [
        "https://kubernetes.default.svc.cluster.local"
    ],
    "exp": 1758787440,
    "iat": 1758783840,
    "iss": "https://kubernetes.default.svc.cluster.local",
    "jti": "64506005-e52b-4323-b70a-009adbbe6c5f",
    "kubernetes.io": {
        "namespace": "vault",
        "serviceaccount": {
            "name": "vault",
            "uid": "1533d563-ed6b-440b-96ba-3af59f82afbe"
        }
    },
    "nbf": 1758783840,
    "sub": "system:serviceaccount:vault:vault"
}

++++++++++++++ Vault Login +++++++++++++++++
{
  "request_id": "08896d89-006a-ae35-1c1d-50628fba295a",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": null,
  "auth": {
    "client_token": "hvs.CAESIN.......",
    "accessor": "4p..........",
    "policies": [
      "default",
      "secret-rw"
    ],
    "token_policies": [
      "default",
      "secret-rw"
    ],
    "metadata": {
      "role": "secret-rw",
      "service_account_name": "vault",
      "service_account_namespace": "vault",
      "service_account_secret_name": "",
      "service_account_uid": "1533d563-ed6b-440b-96ba-3af59f82afbe"
    },
    "lease_duration": 3600,
    "renewable": true,
    "entity_id": "55adaee0-9bc9-2cf4-baa1-7d1002d50834",
    "token_type": "service",
    "orphan": true,
    "mfa_requirement": null,
    "num_uses": 0
  }
}

Units modified

Basic: the deployment determines if it must rely on an External Vault if .Values.security.vault.external_vault_url is present.

  • vault-oidc: do not enable the unit vault-oidc when relying on an External Vault.

  • eso-secret-stores: The field .spec.provider.vault.server.caProvider in the ClusterSecretStore vault is modified: its CA provider can be either Sylva CA for the internal Vault or .Values.security.vault.external_vault_ca for the External Vault. To do that, the secret name .spec.provider.vault.server.caProvider.name is changed to vault-ca. This secret is configured in the unit vault.

  • The unit vault is not modified and vault-external is introduced:

This new unit configures Sylva and the External Vault to allow the k8s resources ClusterSecretStore and RandomSecret to rely on the later:

  • the External Vault is configured with a long life token, issued from the service account token-reviewer-sa. This token is expected to be used by the External Vault to authenticate against the K8S cluster and to validate tokens submitted by its clients.
  • The secret vault/vault-ca, used by the clustersecretstore vault, is configured from `.Values.security.vault.external_vault_ca``.
  • When the external vault does not support TLS (could happen in dev environment):
    • The secret vault-ca is not created
    • The CaProvider configuration is removed from the vault secretstore.
    • A kyverno policy is deployed to remove the field .spec.connection.tLSConfig from the crd `RandomSecret.

When relying on an External Vault two service accounts are defined so as not to mix roles:

  • the service account vault, used by sylva components randomsecret and clustersecretstore to authenticate against the External Vault
  • the service account token-reviewer-sa has the ClusterRole system:auth-delegator. This role is granted to vault to allow the later to authenticate its clients connecting with ServiceAccounts from the management cluster.
$ kubectl --kubeconfig management-cluster-kubeconfig auth --as=system:serviceaccount:vault:vault can-i create tokenreview
no

$ kubectl --kubeconfig management-cluster-kubeconfig auth --as=system:serviceaccount:vault:token-reviewer-sa can-i create toke
nreview
yes

Related reference(s)

Closes issue #2262

Test coverage

  • CI deployment to check that the MR does not break the default deployment relying on an internal KMS
  • Deploy a capo/kadm management cluster relying on an External Vault

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2
🐧 Node OS ubuntu, suse
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha,misc 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 ck8s 🎬 no-wkld 🛠️ light-deploy,k8s-1.31 🐧 ubuntu

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Pierrick Seite

Merge request reports

Loading