Deploy Prometheus to monitor customer apps on Kubernetes
Note: General design for applications discussed in #38464.
With our support for Prometheus continuing to grow, we should offer the ability to automatically deploy Prometheus servers and configure them to monitor a project's various environments. With GitLab's multi-cluster support, this can mean either a single or multiple Prometheus servers for a given project.
Right now, we either ask the customer to bring their own Prometheus server or enable Kubernetes (k8s) monitoring on the bundled Prometheus server. These are not good long term options, however:
- Asking a customer to set up and configure Prometheus is a step that we should not require customers to take. With Kubernetes, spinning up a lightweight app like Prometheus is easy, and we should just take care of it.
- The bundled Prometheus instance should be used to primarily monitor the GitLab service itself. It may not have network reachability to all environments of a project, and best practice for Prometheus is to use multiple servers for different monitoring tasks.
- Dynamic environments like Review apps, in particular, pose a challenge. In these cases, environments will be starting and stopping frequently, and therefore scrape targets will also be changing frequently. There is no way for a customer admin to know and thus define these ahead of time. Similarly attempting to combine both GitLab scrape targets with their permutations plus the complexity of multiple GitLab projects, each having multiple environments, is a significant challenge.
For these reasons we should add Prometheus to GitLab's suite of managed apps on Kubernetes clusters. We can then use the Helm chart to manage the configuration, and update it on demand. This model has a number of benefits:
- The Prometheus server will be running where the environments are, allowing access to likely private scrape targets.
- Configuration complexity will be reduced, where we only need to worry about the scrape targets for the target environment(s) alone.
- Fewer scalability challenges.
- Aligning to Prometheus best practices.
We should consider leveraging the ability to port forward to Kubernetes pods, to not require any external access in the event GitLab and the Prometheus server are not running in the same network segment.
For the configuration itself, since this is in k8s, we can leverage the standard Kubernetes Service Discovery present in Prometheus:
- Add a scrape target for annotated services or pods
- Collect node stats from cAdvisor
- In the future, we can consider allowing custom scrape targets as well
Note: The general design and user experience for installing applications in the Cluster page is found in #38464
Links / references
(Write the start of the documentation of this feature here, include:
- Why should someone use it; what's the underlying problem.
- What is the solution.
- How does someone use this
During implementation, this can then be copied and used as a starter for the documentation.)