Rework GitLab self monitoring project to be powered by kas/agentk
Background
The deprecation of "one-click" managed app installs &4280 leaves us with many open questions regarding how GitLab can continue to communicate with in-cluster services such as Prometheus and ElasticStack (#218220 (comment 489463559), #292460 (comment 485945859)). It also gives us the opportunity to consider how we can address many other long-standing issues related to (re-)using the same cluster(s) for across multiple integrations (#5254 (closed), #26887 (closed), #28415 (closed), etc).
The new GitLab Kubernetes Agent could help us address most or all of these issues, but there are a lot of implementation details to sort out before we can make the agent responsible for all of the ways GitLab interfaces with Kubernetes clusters today.
Goals
- Generate opportunities for outside contribution to the agent and GitLab <-> kas interactions in a focused area with well-defined goals
- A solid use case for GL operators to enable
kas
in their existing production deployments and provide feedback without having to buy into or support anything new - Provide a test bed for implementing the
agentk
modules necessary for GitLab to usekas
as a proxy to the k8s API server, in-cluster Prometheus, etc. - Help us define what the agent configuration should look like to support these use cases
- Enhance GitLab self monitoring projects with more information about the Kubernetes infrastructure its running on (if it is)
- Lay groundwork for leveraging the agent architecture for userland project deployments while maintaining parity with the functionality provided by GitLab-managed clusters and apps/integrations
Proposal
For cloud native GitLab Helm chart deployments on Kubernetes (and when kas
is enabled) we will also install an agent in the cluster, configured to use the GitLab self monitoring project as its config project. This agent will act as a proxy for the self monitoring project to query the state of the infrastructure the instance is running on.
Implementation
Phase 1
-
implement agentk
module for querying the k8s API server (gitlab-org/cluster-integration/gitlab-agent!255 (merged)) -
update GitLab helm chart to deploy an agent when kas
is enabled -
update self_monitoring/project/create_service.rb
to (optionally) set itself up as the config project for the installed agent -
piggy-back on top of the UI defined in #277323 (closed) to surface basic information about the GitLab deployment like: - status of this agent and perhaps all other agents that are integrated with the instance
- view pod logs for various GitLab components
Phase 2
-
implement agentk
module to act as a reverse proxy for querying in-cluster Prometheus -
upgrade self monitoring project to proxy PromQL queries through kas
rather than connecting to the internal Prometheus address directly when displaying metrics dashboards -
investigate options for making metric dashboards more "data driven" based on what is exposed by the agent and/or consider moving dashboard configuration to the agent config project
Phase 3
-
visualizing the running pods and their health (similar to project deploy boards)
Phase 4
-
begin thinking about how we can apply this as a general solution for monitoring/metrics on all projects deployed via agents (#280563 (closed), #299350 (closed)) -
implement any other agentk
modules necessary to support integrations that aren't being deprecated- i.e. querying pod logs from ElasticStack
-
consider even broader deprecations related to GitLab's direct interaction with clusters in favour of the agent architecture
/cc @nagyv-gitlab @tkuah @ash2k WDYT?