GitLab managed cluster resources

How can we do GitLab Managed applications with the agent architecture?

Release notes

Problem to solve

As a Platform Engineer, I want to enable developers to deploy testing/development versions of the applications either to dedicated or to a shared Kubernetes namespace with a dedicated service account without giving the developers authorization to create namespaces and service accounts, so they can test and share their version easily for testing without compromising the cluster.

While the agent for Kubernetes works well for the actual deployment, the CI impersonated "user" should not have authorization to create a new namespace and service account. The CI user should be restricted to the namespace created for it.

Requirements

Create a Kubernetes namespace and dedicated service account for every new/restarted GitLab environment or per project depending on configuration
Remove the related namespace when the GitLab environment is stopped
- Allow an environment to be marked as ephemeral/static. Do not remove the associated namespace when the environment is stopped.
Do not allow any namespace to be used to avoid hijacking production or restricted namespaces
Allow the impersonated user (static identity or CI job) to access the newly created namespace (admin RoleBinding for the namespace)
Creating or removing the K8s resources should not consume Runner minutes
Make it easy to go from the cluster namespace to the GL environment
- use annotation to go from namespace to environment?
Make it easy to find the namespace knowing the GitLab environment
- use a label on the namespace to search by environment?

Proposal

Extend KAS and Rails to support templated generation of namespaces and related RoleBinding.

User flow

Platform engineer configures and install one or more agents for Kubernetes. The agent is set up with ci_access sharing to application projects. Optionally adding templates to the agent (e.g. .gitlab/agents/my-agent/production.yaml).

App dev engineer uses an agent in their CI job with the environment.kubernetes.agent syntax. Optionally specifying a template to use with environment.kubernetes.template: production.yaml.

Before the related job starts, Rails reaches out to KAS to retrieve the specified template (or the default template). Following the template, Rails+KAS:

creates a namespace
creates a RoleBinding for the namespace and the CI job using the gitlab:project_env:<project id>:<environment slug> Group
- TBD: Should it be the admin, edit or a custom role? What permissions does the SA created by the GitLab managed cluster functionality have?
creates a docker-registry type Secret for a deploy token
- initially: there might be no deploy token support
- initially: the deploy token is created without expiry; we want to add automatic rotation and expiry support later
attaches the deploy token secret as an imagePullSecret for the default service account of the namespace

How to handle existing namespace?

Start with picking the existing namespace and use that
Later, we can add option to make the process fail
For easy uniqueness, the template should support generating a random string
We should try supporting multiple environments with the labels we apply to the namespace/Rolebinding. This would allow multiple environments to deploy to the same namespace, and still make it easy to search for a namespace based some environment identified: e.g. gitlab.com/environment/<environment slug>/url: Environment url
We should only apply the non-conflicting labels

Annotations / labels to use on the created resources:

gitlab.com/project-url: GitLab project url
gitlab.com/project-name: GitLab project name
gitlab.com/environment-url: GitLab environment URL
gitlab.com/environment-name: GitLab environment name or slug
gitlab.com/environments/<environment slug>: GitLab environment name or slug # to support multiple environments

Proposed template schema:

namespace:
  name: my-${project.id}-${random(5)}
  labels:
    customer/label: ${project.id}
  annotation:
    customer/annotation: ${project.path_with_namespace}

registry_access:  # do we need to call it `container_registry(access)` ? There is also package registry
  enabled: true
  expiration: 3 months
  kubernetes_secret_name: my-${project.id}-${random(5)}
  default_service_account_binding: true  # kubernetes SA

on_start: create
on_stop: purge

How does the above relate to the lifecycle of an environment?

The resources are created when the environment is created
The resources are removed when the environment is stopped if on_stop: purge is specified (default)
The resources are recreated when the environment is restarted if on_start: create is specified (default)
Later we might extend the logic with
- action on the different values of environment.action
- action when the environment is deleted
- option to scale to zero and back, instead of purging

To be discussed

Error handling: What happens when the "pre-job" Kubernetes-related calls fail?
- Should it fail the job?
- What is the timeout for these "pre" tasks?
- Does re-running the job re-runs the "pre" tasks too?
How to configure the agent to enable these features?
Should we make this functionality available outside of CI?
What values, functions are available for the template?

Intended users

Priyanka (Platform Engineer)

Feature Usage Metrics

RBD

Does this feature require an audit event?

we should re-use the deployment_started event when a deployment to a protected environment starts
we should add audit events for environment_stopped and environment_created when protected environments are stopped or created
separate issues opened

References

A collection of requirements when we still remembered the issues of the GitLab managed resources from 2022
Related documentation pages:
Environment CRD issue (closed)

Edited Nov 20, 2024 by Viktor Nagy (GitLab)