Operator pattern for Kubernetes Agent modules

via #329773 (comment 815851846)

In terms of where to go from here, I was just looking at the Flux architecture/API and it makes a ton of sense to me: https://fluxcd.io/docs/components/

It consists of several independent controllers that, when composed together, form an end-to-end GitOps "pipeline". I find this architecture very appealing because components are maintained independently. Rather than hardcoding a bunch of behaviors into agentk and/or being super opinionated about how agent config/manifest projects work, we could just focus on providing good abstractions that enable users to do whatever they want.

In other words, agentk would basically become a "controller of controllers" and its responsibility would be to manage the lifecycle of itself and all the other components. That way the user doesn't have to deploy new versions of the agent explicitly to get new features, they just get rolled out automatically when the version of kas they are connected to changes.

Note the original context was specifically related to GitOps, but I think the same principal should apply to virtually all features developed for the Agent.

If we are successful, this should provide a clear path for more sophisticated users to extend Agent functionality either by writing their own custom controllers or "teaching" the Agent about how to interpret CRDs owned by third-party controllers. Similarly, it would allow different product groups to fully own their respective Kubernetes integrations rather than ~"group::configure" being on the hook for functionality they don't maintain.

Use Cases

Maintainability

Historically ~"group::configure" has ended up on the hook for the long term maintenance of things like CI templates that power Auto DevOps, Helm charts for Cluster Applications, end-user application deployments, and now Kubernetes Agent modules that power things like Container Network Policies.

Many such things are not core responsibilities of Configure. In order to remain efficient, it is critical that we continue to organize tightly organize ourselves by function. I think it makes more sense to have Configure focus on the core Kubernetes integration as a means to enable other teams. The operator pattern provides us with a means to clearly separate responsibilities.

Automatic Upgrades

Something that is very unique to the KAS architecture is that most similar offerings do not consist of a standalone component that exists outside of the cluster environment for orchestrating workloads. We can use this to our advantage in some interesting ways.

Most obviously, automatic upgrades of "first party" components that are deployed in the user's cluster. This could include things like:

Agent itself
GitLab Runner instances
OpsTrace instances
OpsTrace telemetry collectors

For larger customers with many distinct engineering teams, each operating these components on their own, it can be quite burdensome to ensure group and project-scoped components (like runners) are upgraded alongside major version upgrades to their GitLab instance.

Competition

GitHub's actions-runner-controller gives you Kubernetes-native auto-scaling runner fleet management
- Deep integration with GitHub
- Automatically manages per-project runner registration and credentials
- Super robust autoscaling with the ability to scale based on metrics, % busy, webhooks from GH, and fixed schedules

Edited Feb 18, 2022 by Marshall Cottrell