Kubernetes Gitops PoC Tool Discussion
Just to keep the epic cleaner, I've added this issue for the discussion around the tool options for our kubernetes gitops PoC.
GitOps Tool Comparison/Discussion
As we have a few discussions in flight around the use of GitOps/CD for Kubernetes within GitLab, I'd like to do my own breakdown/summary of our options here.
This is not an extensive analysis in any way, simply a small overview of how they work and main pros/cons.
As our adoption of kubernetes has grown our current tooling and pipelines specifically around the continuous deployment side, are no longer sufficient for our needs.
Goals
To start our comparison we need to have some specific goals in mind.
While we have multiple teams looking in this space, these goals are specifically aimed around internal tooling at gitlab, and the pain points we face from an infrastructure perspective currently, not a product perspective.
- Proper state, diff, sync management, including live diffs of running applications.
- Improved visibility in kubernetes deployments/services.
- Simplified rollback, history and versioning.
- Automatic installment and templating of cluster addons for new clusters (think external-dns, cert-manager, vault-secrets, etc).
- Custom Resource Definition (CRD) management.
- Isolation, reduced blast radius, one application change shouldn't trigger a full environment pipeline.
- Ability to remove the need for writable kubectl access for day to day operations.
- Easier onboarding to kubernetess, debugging, and understanding of application deployments.
- Flexible templating system to ease migration and future initiatives.
Brief product comparison
We are primarily looking at 3 products/projects here. Below is a brief breakdown of some of the features of each.
Flux
Flux is very bare bones and quite simple to stand up and operate.
It is heavily used by other GitOps solutions as the backend due to its modular approach.
It is probably the closest alternative to the GitLab Kubernetes Agent as it currently stands.
It focuses more on mainstream templating languages Kustomize, Helm and has support for raw manifests.
- Current Contributors: 118
- Uses gitops toolkit.
- Agent based - requires agent deployment to target custer.
- No official UI, https://github.com/weaveworks/weave-gitops.
- Supports helm and kustomize only as templating languages, other tools supported by third party plugins/integrations.
- Highly modular, comunity extensions/features.
- Leverages templating system release functionality, e.g helm release, limited diff capabilities.
- CLI Tool.
ArgoCD
Argocd is the more mature and adopted solution of the projects compared here.
It is the only option here that does not require an agent and talks directly to clusters via their management api.
It provides templating support via calling the clients directly which can be extended via custom deployment commands or plugins.
- Current Contributors: 854
- Uses gitops engine.
- No Agent - can talk to clusters directly over management API endpoint.
- Rollback/Sync/Diff/History etc.
- Rich UI with web based terminal.
- Converts all templating to raw kubernetes manifests, which allows it to support any templating engine/tool, unified sync/diff logic.
- Multi Cluster management via Single API.
- ApplicationSets/App of Apps to dynamiclly target and deploy the same app to multiple clusters.
- Plugin system and Webhooks to alter deployment logic or run pre-deployment tooling/scripts.
- CLI Tool.
Gitlab Kubernetes Agent
Gitlab kubernetes agent is our own product. It has the benefit of direct integration into our Gitlab product/ecosystem.
- Current Contributors: Gitlab Maintained.
- Uses cli utils.
- Agent based - requires agent deployment to target custer.
- Agents are "attached" to GitLab projects.
- RBAC granulatiry for users is limited due to the tie in to gitlab permission model.
- Limited UI, in development.
- Supports raw manifest sync, Helm support is experimental.
- Limited diff capabilities.
Dogfooding
One of our core engineering principles at Gitlab is dogfooding.
With any decision like this we have look to whether it is viable or not for us to be dogfooding.
We also must balance this with our need to provide a stable and reliable product for our customers (reliability[https://about.gitlab.com/handbook/engineering/development/principles/#the-importance-of-reliability]).
Specifically speaking from within our own infrastructure, dogfooding something of this nature does carry the potential to hinder us with a less mature product, or one that is under design consideration as we speak.
It is also important to highlight that if we do not dogfood here as the primary choice, does not mean we can't dogfood at all.
We have many environments and we can always aim to dogfood in a smaller environment, in fact we already use KAS for the design app.
Additionally having another leading tool in this specific space running in our own environment also adds to the value of our own product by having a direct comparison to work against, with real day to day workloads/operations.
Tool Migrations
Any tool we consider in this space from an infrastructure perspective, is not a lock in.
All tools here require integration work but they all leverage templating engines that we already use, no changes to manifests would be necessary.
Related discussions
- Details on why we think ArgoCD might be the best choice for our current state of configuration management in Infra gitlab-org/gitlab#357947 (comment 1182940948)
- Notes from Product on using Flux https://docs.google.com/document/d/1hzMSRSqtMe_cGL85_vfqedqHZn877wcZO1uoru2lVLo/edit#heading=h.z3nzk2fik83c with a corresponding MR proposal gitlab-org/gitlab!105922 (closed)
- Shall we switch to flux (gitlab-org/gitlab#357947 - closed)