GitLab managed cluster resources
How can we do GitLab Managed applications with the agent architecture?
Release notes
Problem to solve
As a Platform Engineer, I want to enable developers to deploy testing/development versions of the applications either to dedicated or to a shared Kubernetes namespace with a dedicated service account without giving the developers authorization to create namespaces and service accounts, so they can test and share their version easily for testing without compromising the cluster.
While the agent for Kubernetes works well for the actual deployment, the CI impersonated "user" should not have authorization to create a new namespace and service account. The CI user should be restricted to the namespace created for it.
Requirements
- Create a Kubernetes namespace and dedicated service account for every new/restarted GitLab environment or per project depending on configuration
- Remove the related namespace when the GitLab environment is stopped
- Allow an environment to be marked as ephemeral/static. Do not remove the associated namespace when the environment is stopped.
- Do not allow any namespace to be used to avoid hijacking production or restricted namespaces
- Allow the impersonated user (static identity or CI job) to access the newly created namespace (
adminRoleBindingfor the namespace) - Creating or removing the K8s resources should not consume Runner minutes
- Make it easy to go from the cluster namespace to the GL environment
- use annotation to go from namespace to environment?
- Make it easy to find the namespace knowing the GitLab environment
- use a label on the namespace to search by environment?
Proposal
Extend KAS and Rails to support templated generation of namespaces and related RoleBinding.
User flow
Platform engineer configures and install one or more agents for Kubernetes. The agent is set up with ci_access sharing to application projects. Optionally adding templates to the agent (e.g. .gitlab/agents/my-agent/production.yaml).
App dev engineer uses an agent in their CI job with the environment.kubernetes.agent syntax. Optionally specifying a template to use with environment.kubernetes.template: production.yaml.
Before the related job starts, Rails reaches out to KAS to retrieve the specified template (or the default template). Following the template, Rails+KAS:
- creates a namespace
- creates a RoleBinding for the namespace and the CI job using the
gitlab:project_env:<project id>:<environment slug>Group- TBD: Should it be the
admin,editor a custom role? What permissions does the SA created by the GitLab managed cluster functionality have?
- TBD: Should it be the
- creates a
docker-registrytype Secret for a deploy token- initially: there might be no deploy token support
- initially: the deploy token is created without expiry; we want to add automatic rotation and expiry support later
- attaches the deploy token secret as an
imagePullSecretfor the default service account of the namespace
How to handle existing namespace?
- Start with picking the existing namespace and use that
- Later, we can add option to make the process fail
- For easy uniqueness, the template should support generating a random string
- We should try supporting multiple environments with the labels we apply to the namespace/Rolebinding. This would allow multiple environments to deploy to the same namespace, and still make it easy to search for a namespace based some environment identified: e.g.
gitlab.com/environment/<environment slug>/url: Environment url - We should only apply the non-conflicting labels
Annotations / labels to use on the created resources:
gitlab.com/project-url: GitLab project url
gitlab.com/project-name: GitLab project name
gitlab.com/environment-url: GitLab environment URL
gitlab.com/environment-name: GitLab environment name or slug
gitlab.com/environments/<environment slug>: GitLab environment name or slug # to support multiple environments
Proposed template schema:
namespace:
name: my-${project.id}-${random(5)}
labels:
customer/label: ${project.id}
annotation:
customer/annotation: ${project.path_with_namespace}
registry_access: # do we need to call it `container_registry(access)` ? There is also package registry
enabled: true
expiration: 3 months
kubernetes_secret_name: my-${project.id}-${random(5)}
default_service_account_binding: true # kubernetes SA
on_start: create
on_stop: purge
How does the above relate to the lifecycle of an environment?
- The resources are created when the environment is created
- The resources are removed when the environment is stopped if
on_stop: purgeis specified (default) - The resources are recreated when the environment is restarted if
on_start: createis specified (default) - Later we might extend the logic with
- action on the different values of
environment.action - action when the environment is deleted
- option to scale to zero and back, instead of purging
- action on the different values of
To be discussed
- Error handling: What happens when the "pre-job" Kubernetes-related calls fail?
- Should it fail the job?
- What is the timeout for these "pre" tasks?
- Does re-running the job re-runs the "pre" tasks too?
- How to configure the agent to enable these features?
- Should we make this functionality available outside of CI?
- What values, functions are available for the template?
Intended users
Feature Usage Metrics
RBD
Does this feature require an audit event?
-
we should re-use the
deployment_startedevent when a deployment to a protected environment starts -
we should add audit events for
environment_stoppedandenvironment_createdwhen protected environments are stopped or created -
separate issues opened
References
- A collection of requirements when we still remembered the issues of the GitLab managed resources from 2022
- Related documentation pages:
- Environment CRD issue (closed)