Consider removing the Operator's dependency on the GitLab Helm Charts
Context
The following issues led to using the GitLab Helm Chart within the GitLab Operator as the source for Kubernetes Objects:
- gitlab-org/cloud-native/gitlab-operator#1 (closed)
- gitlab-org/cloud-native/gitlab-operator#2 (closed)
- gitlab-org/cloud-native/gitlab-operator#4 (closed)
- gitlab-org/cloud-native/gitlab-operator#18 (closed)
This was an excellent approach to speed development to get us to Operator GA (gitlab-org&5486 (closed)) because we didn't need to redefine much of the logic that the Charts have implemented.
However, it's worth considering the longer-term feasability for this architecture decision.
Proposal
I propose that we consider (eventually) moving toward the inverse: use the Operator as the source of truth, and provide the option to deploy it with a lightweight Helm chart.
Motivation
Some primary drivers for this proposal are the following:
- Updates to the Chart need to be considered by the Operator, often requiring duplication of logic and therefore extending the engineering effort required for any given deliverable
- Updates in the Operator to reflect changes in the Chart (a leaky abstraction) open potential failure points
- The Operator ingesting the Charts leads to a more convoluted release pattern (see gitlab-org/cloud-native/gitlab-operator#224 (closed))
- We have to duplicate the process of writing tests and running them in CI across both Charts and Operator projects, increasing release cycles and opening many more points of failure
- We were unable to do adequate validating in our AdmissionWebhook because the Helm template took too long to render (gitlab-org/cloud-native/gitlab-operator#321 (closed))
- There has already been a request to create a Helm chart to deploy the Operator (gitlab-org/cloud-native/gitlab-operator#481 (closed)), which means we would end up with a Helm chart that deploys the Operator, which deploys a Helm chart under - a pattern that would be uncommon and unnecessarily complex
Supporting information
This proposed model is already popular with other services. Some examples include:
-
NGINX Ingress Controller (chart in subpath
charts/ingress-nginx
) -
CertManager (chart in subpath
deploy/charts/cert-manager
) -
MinIO (chart in subpath
helm/operator
)
Benefits
Some benefits of this approach include:
- Single source of truth: logic exists only in the Operator, rather than needing to duplicate some logic from the Chart into the Operator (leading to leaky abstractions)
- Enhanced capabilities: the Operator will always offer greater functionality as a long-running process in the cluster, compared to Helm which is not
- Familiar design pattern: this pattern aligns with other popular cloud native components, lowering the barrier to entry for contributions
- Ease of implementation with Golang: complex implementations are arguably easier to implement and test using a robust programming language like Golang compared to Helm templating language, which can be difficult to learn, implement, and debug
- Others?
Drawbacks
Some potential drawbacks include:
- Required knowledge of Golang to contribute: this would mean Golang knowledge is required to contribute, but this is somewhat mitigated by the fact that it's so commonly used in cloud-native projects, and is also balanced by the fact that the alternative (Helm templating) can be difficult on its own
- Significant conversion work: this approach would require significant engineering effort to convert the Helm Charts logic into the Operator.
- Others?
Summary
Overall, the purpose here is to evaluate the longer-term design of the Operator to ensure that we are positioning ourselves to make timely, effective updates to the product. As always, all opinions and thoughts welcome.