Consider removing the Operator's dependency on the GitLab Helm Charts
The following issues led to using the GitLab Helm Chart within the GitLab Operator as the source for Kubernetes Objects:
- gitlab-org/cloud-native/gitlab-operator#1 (closed)
- gitlab-org/cloud-native/gitlab-operator#2 (closed)
- gitlab-org/cloud-native/gitlab-operator#4 (closed)
- gitlab-org/cloud-native/gitlab-operator#18 (closed)
This was an excellent approach to speed development to get us to Operator GA (gitlab-org&5486 (closed)) because we didn't need to redefine much of the logic that the Charts have implemented.
However, it's worth considering the longer-term feasability for this architecture decision.
I propose that we consider (eventually) moving toward the inverse: use the Operator as the source of truth, and provide the option to deploy it with a lightweight Helm chart.
Some primary drivers for this proposal are the following:
- Updates to the Chart need to be considered by the Operator, often requiring duplication of logic and therefore extending the engineering effort required for any given deliverable
- Updates in the Operator to reflect changes in the Chart (a leaky abstraction) open potential failure points
- The Operator ingesting the Charts leads to a more convoluted release pattern (see gitlab-org/cloud-native/gitlab-operator#224 (closed))
- We have to duplicate the process of writing tests and running them in CI across both Charts and Operator projects, increasing release cycles and opening many more points of failure
- We were unable to do adequate validating in our AdmissionWebhook because the Helm template took too long to render (gitlab-org/cloud-native/gitlab-operator#321 (closed))
- There has already been a request to create a Helm chart to deploy the Operator (gitlab-org/cloud-native/gitlab-operator#481 (closed)), which means we would end up with a Helm chart that deploys the Operator, which deploys a Helm chart under - a pattern that would be uncommon and unnecessarily complex
This proposed model is already popular with other services. Some examples include:
NGINX Ingress Controller (chart in subpath
CertManager (chart in subpath
MinIO (chart in subpath
Some benefits of this approach include:
- Single source of truth: logic exists only in the Operator, rather than needing to duplicate some logic from the Chart into the Operator (leading to leaky abstractions)
- Enhanced capabilities: the Operator will always offer greater functionality as a long-running process in the cluster, compared to Helm which is not
- Familiar design pattern: this pattern aligns with other popular cloud native components, lowering the barrier to entry for contributions
- Ease of implementation with Golang: complex implementations are arguably easier to implement and test using a robust programming language like Golang compared to Helm templating language, which can be difficult to learn, implement, and debug
Some potential drawbacks include:
- Required knowledge of Golang to contribute: this would mean Golang knowledge is required to contribute, but this is somewhat mitigated by the fact that it's so commonly used in cloud-native projects, and is also balanced by the fact that the alternative (Helm templating) can be difficult on its own
- Significant conversion work: this approach would require significant engineering effort to convert the Helm Charts logic into the Operator.
Overall, the purpose here is to evaluate the longer-term design of the Operator to ensure that we are positioning ourselves to make timely, effective updates to the product. As always, all opinions and thoughts welcome.